CzechAnalyzerpublic final class CzechAnalyzer extends Analyzer Analyzer for Czech language. Supports an external list of stopwords (words that
will not be indexed at all).
A default set of stopwords is used unless an alternative list is specified, the
exclusion list is empty by default. |
Fields Summary |
---|
public static final String[] | CZECH_STOP_WORDSList of typical stopwords. | private Set | stoptableContains the stopwords used with the StopFilter. |
Constructors Summary |
---|
public CzechAnalyzer()Builds an analyzer with the default stop words ({@link #CZECH_STOP_WORDS}).
stoptable = StopFilter.makeStopSet( CZECH_STOP_WORDS );
| public CzechAnalyzer(String[] stopwords)Builds an analyzer with the given stop words.
stoptable = StopFilter.makeStopSet( stopwords );
| public CzechAnalyzer(HashSet stopwords)
stoptable = stopwords;
| public CzechAnalyzer(File stopwords)Builds an analyzer with the given stop words.
stoptable = WordlistLoader.getWordSet( stopwords );
|
Methods Summary |
---|
public void | loadStopWords(java.io.InputStream wordfile, java.lang.String encoding)Loads stopwords hash from resource stream (file, database...).
if ( wordfile == null ) {
stoptable = new HashSet();
return;
}
try {
// clear any previous table (if present)
stoptable = new HashSet();
InputStreamReader isr;
if (encoding == null)
isr = new InputStreamReader(wordfile);
else
isr = new InputStreamReader(wordfile, encoding);
LineNumberReader lnr = new LineNumberReader(isr);
String word;
while ( ( word = lnr.readLine() ) != null ) {
stoptable.add(word);
}
} catch ( IOException e ) {
stoptable = null;
}
| public final org.apache.lucene.analysis.TokenStream | tokenStream(java.lang.String fieldName, java.io.Reader reader)Creates a TokenStream which tokenizes all the text in the provided Reader.
TokenStream result = new StandardTokenizer( reader );
result = new StandardFilter( result );
result = new LowerCaseFilter( result );
result = new StopFilter( result, stoptable );
return result;
|
|