FileDocCategorySizeDatePackage
Soundex.javaAPI DocAndroid 1.5 API9619Wed May 06 22:41:10 BST 2009org.apache.commons.codec.language

Soundex

public class Soundex extends Object implements StringEncoder
Encodes a string into a Soundex value. Soundex is an encoding used to relate similar names, but can also be used as a general purpose scheme to find word with similar phonemes.
author
Apache Software Foundation
version
$Id: Soundex.java,v 1.26 2004/07/07 23:15:24 ggregory Exp $

Fields Summary
public static final Soundex
US_ENGLISH
An instance of Soundex using the US_ENGLISH_MAPPING mapping.
public static final String
US_ENGLISH_MAPPING_STRING
This is a default mapping of the 26 letters used in US English. A value of 0 for a letter position means do not encode.

(This constant is provided as both an implementation convenience and to allow Javadoc to pick up the value for the constant values page.)

public static final char[]
US_ENGLISH_MAPPING
This is a default mapping of the 26 letters used in US English. A value of 0 for a letter position means do not encode.
private int
maxLength
The maximum length of a Soundex code - Soundex codes are only four characters by definition.
private char[]
soundexMapping
Every letter of the alphabet is "mapped" to a numerical value. This char array holds the values to which each letter is mapped. This implementation contains a default map for US_ENGLISH
Constructors Summary
public Soundex()
Creates an instance using US_ENGLISH_MAPPING

see
Soundex#Soundex(char[])
see
Soundex#US_ENGLISH_MAPPING


                   
      
        this(US_ENGLISH_MAPPING);
    
public Soundex(char[] mapping)
Creates a soundex instance using the given mapping. This constructor can be used to provide an internationalized mapping for a non-Western character set. Every letter of the alphabet is "mapped" to a numerical value. This char array holds the values to which each letter is mapped. This implementation contains a default map for US_ENGLISH

param
mapping Mapping array to use when finding the corresponding code for a given character

        this.setSoundexMapping(mapping);
    
Methods Summary
public intdifference(java.lang.String s1, java.lang.String s2)
Encodes the Strings and returns the number of characters in the two encoded Strings that are the same. This return value ranges from 0 through 4: 0 indicates little or no similarity, and 4 indicates strong similarity or identical values.

param
s1 A String that will be encoded and compared.
param
s2 A String that will be encoded and compared.
return
The number of characters in the two encoded Strings that are the same from 0 to 4.
see
MS T-SQL DIFFERENCE
throws
EncoderException if an error occurs encoding one of the strings
since
1.3


    // BEGIN android-note
    // Removed @see reference to SoundexUtils below, since the class isn't
    // public.
    // END android-note
                                                                                                                                                                          
            
        return SoundexUtils.difference(this, s1, s2);
    
public java.lang.Objectencode(java.lang.Object pObject)
Encodes an Object using the soundex algorithm. This method is provided in order to satisfy the requirements of the Encoder interface, and will throw an EncoderException if the supplied object is not of type java.lang.String.

param
pObject Object to encode
return
An object (or type java.lang.String) containing the soundex code which corresponds to the String supplied.
throws
EncoderException if the parameter supplied is not of type java.lang.String
throws
IllegalArgumentException if a character is not mapped

        if (!(pObject instanceof String)) {
            throw new EncoderException("Parameter supplied to Soundex encode is not of type java.lang.String");
        }
        return soundex((String) pObject);
    
public java.lang.Stringencode(java.lang.String pString)
Encodes a String using the soundex algorithm.

param
pString A String object to encode
return
A Soundex code corresponding to the String supplied
throws
IllegalArgumentException if a character is not mapped

        return soundex(pString);
    
private chargetMappingCode(java.lang.String str, int index)
Used internally by the SoundEx algorithm. Consonants from the same code group separated by W or H are treated as one.

param
str the cleaned working string to encode (in upper case).
param
index the character position to encode
return
Mapping code for a particular character
throws
IllegalArgumentException if the character is not mapped

        char mappedChar = this.map(str.charAt(index));
        // HW rule check
        if (index > 1 && mappedChar != '0") {
            char hwChar = str.charAt(index - 1);
            if ('H" == hwChar || 'W" == hwChar) {
                char preHWChar = str.charAt(index - 2);
                char firstCode = this.map(preHWChar);
                if (firstCode == mappedChar || 'H" == preHWChar || 'W" == preHWChar) {
                    return 0;
                }
            }
        }
        return mappedChar;
    
public intgetMaxLength()
Returns the maxLength. Standard Soundex

deprecated
This feature is not needed since the encoding size must be constant. Will be removed in 2.0.
return
int

        return this.maxLength;
    
private char[]getSoundexMapping()
Returns the soundex mapping.

return
soundexMapping.

        return this.soundexMapping;
    
private charmap(char ch)
Maps the given upper-case character to it's Soudex code.

param
ch An upper-case character.
return
A Soundex code.
throws
IllegalArgumentException Thrown if ch is not mapped.

        int index = ch - 'A";
        if (index < 0 || index >= this.getSoundexMapping().length) {
            throw new IllegalArgumentException("The character is not mapped: " + ch);
        }
        return this.getSoundexMapping()[index];
    
public voidsetMaxLength(int maxLength)
Sets the maxLength.

deprecated
This feature is not needed since the encoding size must be constant. Will be removed in 2.0.
param
maxLength The maxLength to set

        this.maxLength = maxLength;
    
private voidsetSoundexMapping(char[] soundexMapping)
Sets the soundexMapping.

param
soundexMapping The soundexMapping to set.

        this.soundexMapping = soundexMapping;
    
public java.lang.Stringsoundex(java.lang.String str)
Retreives the Soundex code for a given String object.

param
str String to encode using the Soundex algorithm
return
A soundex code for the String supplied
throws
IllegalArgumentException if a character is not mapped

        if (str == null) {
            return null;
        }
        str = SoundexUtils.clean(str);
        if (str.length() == 0) {
            return str;
        }
        char out[] = {'0", '0", '0", '0"};
        char last, mapped;
        int incount = 1, count = 1;
        out[0] = str.charAt(0);
        last = getMappingCode(str, 0);
        while ((incount < str.length()) && (count < out.length)) {
            mapped = getMappingCode(str, incount++);
            if (mapped != 0) {
                if ((mapped != '0") && (mapped != last)) {
                    out[count++] = mapped;
                }
                last = mapped;
            }
        }
        return new String(out);