File Doc Category Size Date Package
GermanStemmer.java API Doc Apache Lucene 1.4.3 9248 Sun May 30 22:24:20 BST 2004 org.apache.lucene.analysis.de

GermanStemmer

java.lang.Object

public class GermanStemmer extends Object

A stemmer for German words. The algorithm is based on the report "A Fast and Simple Stemming Algorithm for German Words" by Jörg Caumanns (joerg.caumanns@isst.fhg.de).

author: Gerhard Schwarz
version: $Id: GermanStemmer.java,v 1.11 2004/05/30 20:24:20 otis Exp $

Fields Summary
private StringBuffer
sb
Buffer for the terms while stemming them.
private int
substCount
Amount of characters that are removed with substitute() while stemming.
Constructors Summary
Methods Summary
private boolean isStemmable(java.lang.String term)
Checks if a term could be stemmed.
return
true if, and only if, the given term consists in letters.
for ( int c = 0; c < term.length(); c++ ) { if ( !Character.isLetter( term.charAt( c ) ) ) return false; } return true;
private void optimize(java.lang.StringBuffer buffer)
Does some optimizations on the term. This optimisations are contextual.
// Additional step for female plurals of professions and inhabitants. if ( buffer.length() > 5 && buffer.substring( buffer.length() - 5, buffer.length() ).equals( "erin*" ) ) { buffer.deleteCharAt( buffer.length() -1 ); strip( buffer ); } // Additional step for irregular plural nouns like "Matrizen -> Matrix". if ( buffer.charAt( buffer.length() - 1 ) == ( 'z" ) ) { buffer.setCharAt( buffer.length() - 1, 'x" ); }
private void removeParticleDenotion(java.lang.StringBuffer buffer)
Removes a particle denotion ("ge") from a term.
if ( buffer.length() > 4 ) { for ( int c = 0; c < buffer.length() - 3; c++ ) { if ( buffer.substring( c, c + 4 ).equals( "gege" ) ) { buffer.delete( c, c + 2 ); return; } } }
private void resubstitute(java.lang.StringBuffer buffer)
Undoes the changes made by substitute(). That are character pairs and character combinations. Umlauts will remain as their corresponding vowel, as "�