FileDocCategorySizeDatePackage
CzechAnalyzer.javaAPI DocApache Lucene 1.95628Mon Feb 20 09:19:02 GMT 2006org.apache.lucene.analysis.cz

CzechAnalyzer

public final class CzechAnalyzer extends Analyzer
Analyzer for Czech language. Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified, the exclusion list is empty by default.
author
Lukas Zapletal [lzap@root.cz]

Fields Summary
public static final String[]
CZECH_STOP_WORDS
List of typical stopwords.
private Set
stoptable
Contains the stopwords used with the StopFilter.
Constructors Summary
public CzechAnalyzer()
Builds an analyzer with the default stop words ({@link #CZECH_STOP_WORDS}).


	          	 
	  
		stoptable = StopFilter.makeStopSet( CZECH_STOP_WORDS );
	
public CzechAnalyzer(String[] stopwords)
Builds an analyzer with the given stop words.

		stoptable = StopFilter.makeStopSet( stopwords );
	
public CzechAnalyzer(Hashtable stopwords)
Builds an analyzer with the given stop words.

deprecated

		stoptable = new HashSet(stopwords.keySet());
	
public CzechAnalyzer(HashSet stopwords)

		stoptable = stopwords;
	
public CzechAnalyzer(File stopwords)
Builds an analyzer with the given stop words.

		stoptable = WordlistLoader.getWordSet( stopwords );
	
Methods Summary
public voidloadStopWords(java.io.InputStream wordfile, java.lang.String encoding)
Loads stopwords hash from resource stream (file, database...).

param
wordfile File containing the wordlist
param
encoding Encoding used (win-1250, iso-8859-2, ...), null for default system encoding

        if ( wordfile == null ) {
            stoptable = new HashSet();
            return;
        }
        try {
            // clear any previous table (if present)
            stoptable = new HashSet();

            InputStreamReader isr;
            if (encoding == null)
                isr = new InputStreamReader(wordfile);
            else
                isr = new InputStreamReader(wordfile, encoding);

            LineNumberReader lnr = new LineNumberReader(isr);
            String word;
            while ( ( word = lnr.readLine() ) != null ) {
                stoptable.add(word);
            }

        } catch ( IOException e ) {
            stoptable = null;
        }
    
public final org.apache.lucene.analysis.TokenStreamtokenStream(java.lang.String fieldName, java.io.Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.

return
A TokenStream build from a StandardTokenizer filtered with StandardFilter, LowerCaseFilter, and StopFilter

		TokenStream result = new StandardTokenizer( reader );
		result = new StandardFilter( result );
		result = new LowerCaseFilter( result );
		result = new StopFilter( result, stoptable );
		return result;