FileDocCategorySizeDatePackage
Soundex.javaAPI DocExample3346Sat Mar 31 12:05:54 BST 2001None

Soundex

public class Soundex extends Object
Soundex - the Soundex Algorithm, as described by Knuth

This class implements the soundex algorithm as described by Donald Knuth in Volume 3 of The Art of Computer Programming. The algorithm is intended to hash words (in particular surnames) into a small space using a simple model which approximates the sound of the word when spoken by an English speaker. Each word is reduced to a four character string, the first character being an upper case letter and the remaining three being digits. Double letters are collapsed to a single digit.

EXAMPLES

Knuth's examples of various names and the soundex codes they map to are: Euler, Ellery -> E460 Gauss, Ghosh -> G200 Hilbert, Heilbronn -> H416 Knuth, Kant -> K530 Lloyd, Ladd -> L300 Lukasiewicz, Lissajous -> L222

LIMITATIONS

As the soundex algorithm was originally used a long time ago in the United States of America, it uses only the English alphabet and pronunciation.

As it is mapping a large space (arbitrary length strings) onto a small space (single letter plus 3 digits) no inference can be made about the similarity of two strings which end up with the same soundex code. For example, both "Hilbert" and "Heilbronn" end up with a soundex code of "H416".

The soundex() method is static, as it maintains no per-instance state; this means you never need to instantiate this class.

author
Perl implementation by Mike Stok () from the description given by Knuth. Ian Phillips () and Rich Pinder () supplied ideas and spotted mistakes.
author
Ian Darwin, ian@darwinsys.com (Java Version)
version
$Id: Soundex.java,v 1.6 2001/03/31 16:05:55 ian Exp $

Fields Summary
public static final char[]
MAP
Constructors Summary
Methods Summary
public static java.lang.Stringsoundex(java.lang.String s)
Convert the given String to its Soundex code.

return
null If the given string can't be mapped to Soundex.


	                   	 
	     

		// Algorithm works on uppercase (mainframe era).
		String t = s.toUpperCase();

		StringBuffer res = new StringBuffer();
		char c, prev = '?";

		// Main loop: find up to 4 chars that map.
		for (int i=0; i<t.length() && res.length() < 4 &&
			(c = t.charAt(i)) != ',"; i++) {

			// Check to see if the given character is alphabetic.
			// Text is already converted to uppercase. Algorithm
			// only handles ASCII letters, do NOT use Character.isLetter()!
			// Also, skip double letters.
			if (c>='A" && c<='Z" && c != prev) {
				prev = c;

				// First char is installed unchanged, for sorting.
				if (i==0)
					res.append(c);
				else {
					char m = MAP[c-'A"];
					Debug.println("inner", c + " --> " + m);
					if (m != '0")
						res.append(m);
				}
			}
		}
		if (res.length() == 0)
			return null;
		for (int i=res.length(); i<4; i++)
			res.append('0");
		return res.toString();