FileDocCategorySizeDatePackage
SoundexUtils.javaAPI DocAndroid 1.5 API4244Wed May 06 22:41:10 BST 2009org.apache.commons.codec.language

SoundexUtils

public final class SoundexUtils extends Object
Utility methods for {@link Soundex} and {@link RefinedSoundex} classes.
author
Apache Software Foundation
version
$Id: SoundexUtils.java,v 1.5 2004/03/17 18:31:35 ggregory Exp $
since
1.3

Fields Summary
Constructors Summary
Methods Summary
static java.lang.Stringclean(java.lang.String str)
Cleans up the input string before Soundex processing by only returning upper case letters.

param
str The String to clean.
return
A clean String.

        if (str == null || str.length() == 0) {
            return str;
        }
        int len = str.length();
        char[] chars = new char[len];
        int count = 0;
        for (int i = 0; i < len; i++) {
            if (Character.isLetter(str.charAt(i))) {
                chars[count++] = str.charAt(i);
            }
        }
        if (count == len) {
            return str.toUpperCase();
        }
        return new String(chars, 0, count).toUpperCase();
    
static intdifference(org.apache.commons.codec.StringEncoder encoder, java.lang.String s1, java.lang.String s2)
Encodes the Strings and returns the number of characters in the two encoded Strings that are the same.
  • For Soundex, this return value ranges from 0 through 4: 0 indicates little or no similarity, and 4 indicates strong similarity or identical values.
  • For refined Soundex, the return value can be greater than 4.

param
encoder The encoder to use to encode the Strings.
param
s1 A String that will be encoded and compared.
param
s2 A String that will be encoded and compared.
return
The number of characters in the two Soundex encoded Strings that are the same.
see
#differenceEncoded(String,String)
see
MS T-SQL DIFFERENCE
throws
EncoderException if an error occurs encoding one of the strings

        return differenceEncoded(encoder.encode(s1), encoder.encode(s2));
    
static intdifferenceEncoded(java.lang.String es1, java.lang.String es2)
Returns the number of characters in the two Soundex encoded Strings that are the same.
  • For Soundex, this return value ranges from 0 through 4: 0 indicates little or no similarity, and 4 indicates strong similarity or identical values.
  • For refined Soundex, the return value can be greater than 4.

param
es1 An encoded String.
param
es2 An encoded String.
return
The number of characters in the two Soundex encoded Strings that are the same.
see
MS T-SQL DIFFERENCE


        if (es1 == null || es2 == null) {
            return 0;
        }
        int lengthToMatch = Math.min(es1.length(), es2.length());
        int diff = 0;
        for (int i = 0; i < lengthToMatch; i++) {
            if (es1.charAt(i) == es2.charAt(i)) {
                diff++;
            }
        }
        return diff;