File Doc Category Size Date Package
GermanStemmer.java API Doc Apache Lucene 2.1.0 9541 Wed Feb 14 10:46:28 GMT 2007 org.apache.lucene.analysis.de

GermanStemmer

java.lang.Object

public class GermanStemmer extends Object

A stemmer for German words. The algorithm is based on the report "A Fast and Simple Stemming Algorithm for German Words" by Jörg Caumanns (joerg.caumanns at isst.fhg.de).

author: Gerhard Schwarz
version: $Id: GermanStemmer.java 472959 2006-11-09 16:21:50Z yonik $

(Omit source code)

Fields Summary
private StringBuffer
sb
Buffer for the terms while stemming them.
private int
substCount
Amount of characters that are removed with substitute() while stemming.
Constructors Summary
Methods Summary
private boolean isStemmable(java.lang.String term)
Checks if a term could be stemmed.
return
true if, and only if, the given term consists in letters.
for ( int c = 0; c < term.length(); c++ ) { if ( !Character.isLetter( term.charAt( c ) ) ) return false; } return true;
private void optimize(java.lang.StringBuffer buffer)
Does some optimizations on the term. This optimisations are contextual.
// Additional step for female plurals of professions and inhabitants. if ( buffer.length() > 5 && buffer.substring( buffer.length() - 5, buffer.length() ).equals( "erin*" ) ) { buffer.deleteCharAt( buffer.length() -1 ); strip( buffer ); } // Additional step for irregular plural nouns like "Matrizen -> Matrix". if ( buffer.charAt( buffer.length() - 1 ) == ( 'z" ) ) { buffer.setCharAt( buffer.length() - 1, 'x" ); }
private void removeParticleDenotion(java.lang.StringBuffer buffer)
Removes a particle denotion ("ge") from a term.
if ( buffer.length() > 4 ) { for ( int c = 0; c < buffer.length() - 3; c++ ) { if ( buffer.substring( c, c + 4 ).equals( "gege" ) ) { buffer.delete( c, c + 2 ); return; } } }
private void resubstitute(java.lang.StringBuffer buffer)
Undoes the changes made by substitute(). That are character pairs and character combinations. Umlauts will remain as their corresponding vowel, as "�