Sometimes Lucene and Solr out of the box functionality is not enough. When such time comes, we need to extend what Lucene and Solr gives us and create our own plugin. In todays post I’ll try to show how to develop a custom filter and use it in Solr.
Solr Version
The following code is based on Solr 3.6. We will make an updated version of this post, that match Solr 4.0, after its release.
Assumptions
Lets assume, that we need a filter that would allow us to reverse every word we have in a given field. So, if the input is “solr.pl” the output would be “lp.rlos”. It’s not the hardest example, but for the purpose of this entry it will be enough. One more thing – I decided to omit describing how to setup your IDE, how to compile your code, build jar and stuff like that. We will only focus on the code.
Filter
In order to implement our filter we will extends the TokenFilter class from the org.apache.lucene.analysis and we will override the incrementToken method. This method returns a boolean value – if a value is still available for processing in the token stream, this method should return true, is the token in the token stream shouldn’t be further analyzed this method should return false. The implementation should look like the one below:
package com.kuntal.analysis;
import java.io.IOException;
import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
public final class ReverseFilter extends TokenFilter {
private CharTermAttribute charTermAttr;
protected ReverseFilter(TokenStream ts) {
super(ts);
this.charTermAttr = addAttribute(CharTermAttribute.class);
}
@Override
public boolean incrementToken() throws IOException {
if (!input.incrementToken()) {
return false;
}
int length = charTermAttr.length();
char[] buffer = charTermAttr.buffer();
char[] newBuffer = new char[length];
for (int i = 0; i < length; i++) {
newBuffer[i] = buffer[length - 1 - i];
}
charTermAttr.setEmpty();
charTermAttr.copyBuffer(newBuffer, 0, newBuffer.length);
return true;
}
}
Filter Factory
As I wrote earlier, in order for Solr to be able to use our filter, we need to implement filter factory class. Because, we don’t have any special configuration values and such, factory implementation should be very simple. We will extends BaseTokenFilterFactory class from the org.apache.solr.analysis package. The implementation can look like the following:
import org.apache.lucene.analysis.TokenStream;
import org.apache.solr.analysis.BaseTokenFilterFactory;
public class ReverseFilterFactory extends BaseTokenFilterFactory {
@Override
public TokenStream create(TokenStream ts) {
return new ReverseFilter(ts);
}
}
As you can see filter factory implementation is simple – we only needed to override a single create method in which we instantiate our filter and return it.
As I wrote earlier, in order for Solr to be able to use our filter, we need to implement filter factory class. Because, we don’t have any special configuration values and such, factory implementation should be very simple. We will extends BaseTokenFilterFactory class from the org.apache.solr.analysis package. The implementation can look like the following:
import org.apache.lucene.analysis.TokenStream; |
| import org.apache.solr.analysis.BaseTokenFilterFactory; |
|
| public class ReverseFilterFactory extends BaseTokenFilterFactory { |
| @Override |
| public TokenStream create(TokenStream ts) { |
| return new ReverseFilter(ts); |
| } |
| } |
As you can see filter factory implementation is simple – we only needed to override a single create method in which we instantiate our filter and return it.
Configuration
After compilation and jar file preparation, we copy the jar to a directory Solr will be able to see it. We can do this by creating the lib directory in the Solr home directory and then adding the following entry to the solrconfig.xml file:
<lib dir="../lib/" regex="*.jar" />
Then we change the schema.xml file and we add a new field type that will use our filter:
<fieldType name="text_reversed" class="solr.TextField">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="pl.solr.analysis.ReverseFilterFactory" />
</analyzer>
</fieldType>
Sililarly you can write your custom indexer and analyzer in Solr based on your use case.Hope it helps you! :)
After compilation and jar file preparation, we copy the jar to a directory Solr will be able to see it. We can do this by creating the lib directory in the Solr home directory and then adding the following entry to the solrconfig.xml file:
| <lib dir="../lib/" regex="*.jar" /> |
Then we change the schema.xml file and we add a new field type that will use our filter:
| <fieldType name="text_reversed" class="solr.TextField"> |
| <analyzer> |
| <tokenizer class="solr.WhitespaceTokenizerFactory"/> |
| <filter class="pl.solr.analysis.ReverseFilterFactory" /> |
| </analyzer> |
| </fieldType> |
Sililarly you can write your custom indexer and analyzer in Solr based on your use case.Hope it helps you! :)
No comments:
Post a Comment