FileDocCategorySizeDatePackage
WordlistLoader.javaAPI DocApache Lucene 2.1.03766Wed Feb 14 10:46:38 GMT 2007org.apache.lucene.analysis

WordlistLoader

public class WordlistLoader extends Object
Loader for text files that represent a list of stopwords.
author
Gerhard Schwarz
version
$Id: WordlistLoader.java 472959 2006-11-09 16:21:50Z yonik $

Fields Summary
Constructors Summary
Methods Summary
public static java.util.HashMapgetStemDict(java.io.File wordstemfile)
Reads a stem dictionary. Each line contains:
word\tstem
(i.e. two tab seperated words)

return
stem dictionary that overrules the stemming algorithm
throws
IOException

    if (wordstemfile == null)
      throw new NullPointerException("wordstemfile may not be null");
    HashMap result = new HashMap();
    BufferedReader br = null;
    FileReader fr = null;
    try {
      fr = new FileReader(wordstemfile);
      br = new BufferedReader(fr);
      String line;
      while ((line = br.readLine()) != null) {
        String[] wordstem = line.split("\t", 2);
        result.put(wordstem[0], wordstem[1]);
      }
    } finally {
      if (fr != null)
        fr.close();
      if (br != null)
        br.close();
    }
    return result;
  
public static java.util.HashSetgetWordSet(java.io.File wordfile)
Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

param
wordfile File containing the wordlist
return
A HashSet with the file's words

    HashSet result = new HashSet();
    FileReader reader = null;
    try {
      reader = new FileReader(wordfile);
      result = getWordSet(reader);
    }
    finally {
      if (reader != null)
        reader.close();
    }
    return result;
  
public static java.util.HashSetgetWordSet(java.io.Reader reader)
Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

param
reader Reader containing the wordlist
return
A HashSet with the reader's words

    HashSet result = new HashSet();
    BufferedReader br = null;
    try {
      if (reader instanceof BufferedReader) {
        br = (BufferedReader) reader;
      } else {
        br = new BufferedReader(reader);
      }
      String word = null;
      while ((word = br.readLine()) != null) {
        result.add(word.trim());
      }
    }
    finally {
      if (br != null)
        br.close();
    }
    return result;