FileDocCategorySizeDatePackage
FrenchAnalyzer.javaAPI DocApache Lucene 1.96212Mon Feb 20 09:18:52 GMT 2006org.apache.lucene.analysis.fr

FrenchAnalyzer

public final class FrenchAnalyzer extends Analyzer
Analyzer for French language. Supports an external list of stopwords (words that will not be indexed at all) and an external list of exclusions (word that will not be stemmed, but indexed). A default set of stopwords is used unless an alternative list is specified, the exclusion list is empty by default.
author
Patrick Talbot (based on Gerhard Schwarz's work for German)
version
$Id: FrenchAnalyzer.java 178832 2005-05-27 23:00:49Z dnaber $

Fields Summary
public static final String[]
FRENCH_STOP_WORDS
Extended list of typical French stopwords.
private Set
stoptable
Contains the stopwords used with the StopFilter.
private Set
excltable
Contains words that should be indexed but not stemmed.
Constructors Summary
public FrenchAnalyzer()
Builds an analyzer with the default stop words ({@link #FRENCH_STOP_WORDS}).


               
    
    stoptable = StopFilter.makeStopSet(FRENCH_STOP_WORDS);
  
public FrenchAnalyzer(String[] stopwords)
Builds an analyzer with the given stop words.

    stoptable = StopFilter.makeStopSet(stopwords);
  
public FrenchAnalyzer(Hashtable stopwords)
Builds an analyzer with the given stop words.

deprecated

    stoptable = new HashSet(stopwords.keySet());
  
public FrenchAnalyzer(File stopwords)
Builds an analyzer with the given stop words.

throws
IOException

    stoptable = new HashSet(WordlistLoader.getWordSet(stopwords));
  
Methods Summary
public voidsetStemExclusionTable(java.lang.String[] exclusionlist)
Builds an exclusionlist from an array of Strings.

    excltable = StopFilter.makeStopSet(exclusionlist);
  
public voidsetStemExclusionTable(java.util.Hashtable exclusionlist)
Builds an exclusionlist from a Hashtable.

    excltable = new HashSet(exclusionlist.keySet());
  
public voidsetStemExclusionTable(java.io.File exclusionlist)
Builds an exclusionlist from the words contained in the given file.

throws
IOException

    excltable = new HashSet(WordlistLoader.getWordSet(exclusionlist));
  
public final org.apache.lucene.analysis.TokenStreamtokenStream(java.lang.String fieldName, java.io.Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.

return
A TokenStream build from a StandardTokenizer filtered with StandardFilter, StopFilter, FrenchStemFilter and LowerCaseFilter


    if (fieldName == null) throw new IllegalArgumentException("fieldName must not be null");
    if (reader == null) throw new IllegalArgumentException("reader must not be null");

    TokenStream result = new StandardTokenizer(reader);
    result = new StandardFilter(result);
    result = new StopFilter(result, stoptable);
    result = new FrenchStemFilter(result, excltable);
    // Convert to lowercase after stemming!
    result = new LowerCaseFilter(result);
    return result;