StopFilterpublic final class StopFilter extends TokenFilter Removes stop words from a token stream. |
Fields Summary |
---|
private final Set | stopWords | private final boolean | ignoreCase |
Constructors Summary |
---|
public StopFilter(TokenStream input, String[] stopWords)Construct a token stream filtering the given input.
this(input, stopWords, false);
| public StopFilter(TokenStream in, String[] stopWords, boolean ignoreCase)Constructs a filter which removes words from the input
TokenStream that are named in the array of words.
super(in);
this.ignoreCase = ignoreCase;
this.stopWords = makeStopSet(stopWords, ignoreCase);
| public StopFilter(TokenStream input, Set stopWords, boolean ignoreCase)Construct a token stream filtering the given input.
super(input);
this.ignoreCase = ignoreCase;
this.stopWords = stopWords;
| public StopFilter(TokenStream in, Set stopWords)Constructs a filter which removes words from the input
TokenStream that are named in the Set.
It is crucial that an efficient Set implementation is used
for maximum performance.
this(in, stopWords, false);
|
Methods Summary |
---|
public static final java.util.Set | makeStopSet(java.lang.String[] stopWords)Builds a Set from an array of stop words,
appropriate for passing into the StopFilter constructor.
This permits this stopWords construction to be cached once when
an Analyzer is constructed.
return makeStopSet(stopWords, false);
| public static final java.util.Set | makeStopSet(java.lang.String[] stopWords, boolean ignoreCase)
HashSet stopTable = new HashSet(stopWords.length);
for (int i = 0; i < stopWords.length; i++)
stopTable.add(ignoreCase ? stopWords[i].toLowerCase() : stopWords[i]);
return stopTable;
| public final org.apache.lucene.analysis.Token | next()Returns the next input Token whose termText() is not a stop word.
// return the first non-stop word found
for (Token token = input.next(); token != null; token = input.next())
{
String termText = ignoreCase ? token.termText.toLowerCase() : token.termText;
if (!stopWords.contains(termText))
return token;
}
// reached EOS -- return null
return null;
|
|