SynExpandpublic final class SynExpand extends Object Expand a query by looking up synonyms for every term.
You need to invoke {@link Syns2Index} first to build the synonym index. |
Methods Summary |
---|
public static org.apache.lucene.search.Query | expand(java.lang.String query, org.apache.lucene.search.Searcher syns, org.apache.lucene.analysis.Analyzer a, java.lang.String field, float boost)Perform synonym expansion on a query.
Set already = new HashSet(); // avoid dups
List top = new LinkedList(); // needs to be separately listed..
if ( field == null) field = "contents";
if ( a == null) a = new StandardAnalyzer();
// [1] Parse query into separate words so that when we expand we can avoid dups
TokenStream ts = a.tokenStream( field, new StringReader( query));
org.apache.lucene.analysis.Token t;
while ( (t = ts.next()) != null)
{
String word = t.termText();
if ( already.add( word))
top.add( word);
}
BooleanQuery tmp = new BooleanQuery();
// [2] form query
Iterator it = top.iterator();
while ( it.hasNext())
{
// [2a] add to level words in
String word = (String) it.next();
TermQuery tq = new TermQuery( new Term( field, word));
tmp.add( tq, BooleanClause.Occur.SHOULD);
// [2b] add in unique synonums
Hits hits = syns.search( new TermQuery( new Term(Syns2Index.F_WORD, word)));
for (int i = 0; i < hits.length(); i++)
{
Document doc = hits.doc(i);
String[] values = doc.getValues( Syns2Index.F_SYN);
for ( int j = 0; j < values.length; j++)
{
String syn = values[ j];
if ( already.add( syn)) // avoid dups of top level words and synonyms
{
tq = new TermQuery( new Term( field, syn));
if ( boost > 0) // else keep normal 1.0
tq.setBoost( boost);
tmp.add( tq, BooleanClause.Occur.SHOULD);
}
}
}
}
return tmp;
| public static void | main(java.lang.String[] args)Test driver for synonym expansion.
Uses boost factor of 0.9 for illustrative purposes.
If you pass in the query "big dog" then it prints out:
Query: big adult^0.9 bad^0.9 bighearted^0.9 boastful^0.9 boastfully^0.9 bounteous^0.9 bountiful^0.9 braggy^0.9 crowing^0.9 freehanded^0.9 giving^0.9 grown^0.9 grownup^0.9 handsome^0.9 large^0.9 liberal^0.9 magnanimous^0.9 momentous^0.9 openhanded^0.9 prominent^0.9 swelled^0.9 vainglorious^0.9 vauntingly^0.9
dog andiron^0.9 blackguard^0.9 bounder^0.9 cad^0.9 chase^0.9 click^0.9 detent^0.9 dogtooth^0.9 firedog^0.9 frank^0.9 frankfurter^0.9 frump^0.9 heel^0.9 hotdog^0.9 hound^0.9 pawl^0.9 tag^0.9 tail^0.9 track^0.9 trail^0.9 weenie^0.9 wiener^0.9 wienerwurst^0.9
if (args.length != 2)
{
System.out.println(
"java org.apache.lucene.wordnet.SynExpand <index path> <query>");
}
FSDirectory directory = FSDirectory.getDirectory(args[0], false);
IndexSearcher searcher = new IndexSearcher(directory);
String query = args[1];
String field = "contents";
Query q = expand( query, searcher, new StandardAnalyzer(), field, 0.9f);
System.out.println( "Query: " + q.toString( field));
searcher.close();
directory.close();
|
|