File Doc Category Size Date Package
CzechAnalyzer.java API Doc Apache Lucene 2.1.0 5631 Wed Feb 14 10:46:32 GMT 2007 org.apache.lucene.analysis.cz

CzechAnalyzer

java.lang.Object
- org.apache.lucene.analysis.Analyzer

public final class CzechAnalyzer extends Analyzer

Analyzer for Czech language. Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified, the exclusion list is empty by default.

author: Lukas Zapletal [lzap@root.cz]

Fields Summary
public static final String[]
CZECH_STOP_WORDS
List of typical stopwords.
private Set
stoptable
Contains the stopwords used with the StopFilter.
Constructors Summary
public CzechAnalyzer()
Builds an analyzer with the default stop words ({@link #CZECH_STOP_WORDS}).
stoptable = StopFilter.makeStopSet( CZECH_STOP_WORDS );
public CzechAnalyzer(String[] stopwords)
Builds an analyzer with the given stop words.
stoptable = StopFilter.makeStopSet( stopwords );
public CzechAnalyzer(HashSet stopwords)
stoptable = stopwords;
public CzechAnalyzer(File stopwords)
Builds an analyzer with the given stop words.
stoptable = WordlistLoader.getWordSet( stopwords );
Methods Summary
public void loadStopWords(java.io.InputStream wordfile, java.lang.String encoding)
Loads stopwords hash from resource stream (file, database...).
param
wordfile File containing the wordlist
param
encoding Encoding used (win-1250, iso-8859-2, ...), null for default system encoding
if ( wordfile == null ) { stoptable = new HashSet(); return; } try { // clear any previous table (if present) stoptable = new HashSet(); InputStreamReader isr; if (encoding == null) isr = new InputStreamReader(wordfile); else isr = new InputStreamReader(wordfile, encoding); LineNumberReader lnr = new LineNumberReader(isr); String word; while ( ( word = lnr.readLine() ) != null ) { stoptable.add(word); } } catch ( IOException e ) { stoptable = null; }
public final org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName, java.io.Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.
return
A TokenStream build from a StandardTokenizer filtered with StandardFilter, LowerCaseFilter, and StopFilter
TokenStream result = new StandardTokenizer( reader ); result = new StandardFilter( result ); result = new LowerCaseFilter( result ); result = new StopFilter( result, stoptable ); return result;