FileDocCategorySizeDatePackage
RussianStemFilter.javaAPI DocApache Lucene 2.1.02520Wed Feb 14 10:46:28 GMT 2007org.apache.lucene.analysis.ru

RussianStemFilter

public final class RussianStemFilter extends TokenFilter
A filter that stems Russian words. The implementation was inspired by GermanStemFilter. The input should be filtered by RussianLowerCaseFilter before passing it to RussianStemFilter , because RussianStemFilter only works with lowercase part of any "russian" charset.
author
Boris Okner, b.okner@rogers.com
version
$Id: RussianStemFilter.java 472959 2006-11-09 16:21:50Z yonik $

Fields Summary
private Token
token
The actual token in the input stream.
private RussianStemmer
stemmer
Constructors Summary
public RussianStemFilter(TokenStream in, char[] charset)


        
    
        super(in);
        stemmer = new RussianStemmer(charset);
    
Methods Summary
public final org.apache.lucene.analysis.Tokennext()

return
Returns the next token in the stream, or null at EOS

        if ((token = input.next()) == null)
        {
            return null;
        }
        else
        {
            String s = stemmer.stem(token.termText());
            if (!s.equals(token.termText()))
            {
                return new Token(s, token.startOffset(), token.endOffset(),
                    token.type());
            }
            return token;
        }
    
public voidsetStemmer(RussianStemmer stemmer)
Set a alternative/custom RussianStemmer for this filter.

        if (stemmer != null)
        {
            this.stemmer = stemmer;
        }