Soundex - the Soundex Algorithm, as described by Knuth
This class implements the soundex algorithm as described by Donald
Knuth in Volume 3 of The Art of Computer Programming. The
algorithm is intended to hash words (in particular surnames) into
a small space using a simple model which approximates the sound of
the word when spoken by an English speaker. Each word is reduced
to a four character string, the first character being an upper case
letter and the remaining three being digits. Double letters are
collapsed to a single digit.
EXAMPLES
Knuth's examples of various names and the soundex codes they map
to are:
Euler, Ellery -> E460
Gauss, Ghosh -> G200
Hilbert, Heilbronn -> H416
Knuth, Kant -> K530
Lloyd, Ladd -> L300
Lukasiewicz, Lissajous -> L222
LIMITATIONS
As the soundex algorithm was originally used a long time ago
in the United States of America, it uses only the English alphabet
and pronunciation.
As it is mapping a large space (arbitrary length strings) onto a
small space (single letter plus 3 digits) no inference can be made
about the similarity of two strings which end up with the same
soundex code. For example, both "Hilbert" and "Heilbronn" end up
with a soundex code of "H416".
The soundex() method is static, as it maintains no per-instance
state; this means you never need to instantiate this class. |