File Doc Category Size Date Package
Collator.java API Doc Android 1.5 API 15228 Wed May 06 22:41:04 BST 2009 com.ibm.icu4jni.text

Collator

java.lang.Object

public abstract class Collator extends Object implements Cloneable

Abstract class handling locale specific collation via JNI and ICU. Subclasses implement specific collation strategies. One subclass, com.ibm.icu4jni.text.RuleBasedCollator, is currently provided and is applicable to a wide set of languages. Other subclasses may be created to handle more specialized needs. You can use the static factory method, getInstance(), to obtain the appropriate Collator object for a given locale.

// Compare two strings in the default locale
Collator myCollator = Collator.getInstance();
if (myCollator.compare("abc", "ABC") < 0) {
System.out.println("abc is less than ABC");
}
else {
System.out.println("abc is greater than or equal to ABC");
}

You can set a Collator's strength property to determine the level of difference considered significant in comparisons. Five strengths in CollationAttribute are provided: VALUE_PRIMARY, VALUE_SECONDARY, VALUE_TERTIARY, VALUE_QUARTENARY and VALUE_IDENTICAL. The exact assignment of strengths to language features is locale dependant. For example, in Czech, "e" and "f" are considered primary differences, while "e" and "?" latin small letter e with circumflex are secondary differences, "e" and "E" are tertiary differences and "e" and "e" are identical.

The following shows how both case and accents could be ignored for US English.

//Get the Collator for US English and set its strength to PRIMARY
Collator usCollator = Collator.getInstance(Locale.US);
usCollator.setStrength(Collator.PRIMARY);
if (usCollator.compare("abc", "ABC") == 0) {
System.out.println("Strings are equivalent");
}

For comparing Strings exactly once, the compare method provides the best performance. When sorting a list of Strings however, it is generally necessary to compare each String multiple times. In this case, com.ibm.icu4jni.text.CollationKey provide better performance. The CollationKey class converts a String to a series of bits that can be compared bitwise against other CollationKeys. A CollationKey is created by a Collator object for a given String. Note: CollationKeys from different Collators can not be compared.

Considerations : 1) ErrorCode not returned to user throw exceptions instead 2) Similar API to java.text.Collator

author: syn wee quek
stable: ICU 2.4

Fields Summary
public static final int
PRIMARY
Strongest collator strength value. Typically used to denote differences between base characters. See class documentation for more explanation.
public static final int
SECONDARY
Second level collator strength value. Accents in the characters are considered secondary differences. Other differences between letters can also be considered secondary differences, depending on the language. See class documentation for more explanation.
public static final int
TERTIARY
Third level collator strength value. Upper and lower case differences in characters are distinguished at this strength level. In addition, a variant of a letter differs from the base form on the tertiary level. See class documentation for more explanation.
public static final int
QUATERNARY
Fourth level collator strength value. When punctuation is ignored (see Ignoring Punctuations in the user guide) at PRIMARY to TERTIARY strength, an additional strength level can be used to distinguish words with and without punctuation. See class documentation for more explanation.
public static final int
IDENTICAL
Smallest Collator strength value. When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. See class documentation for more explanation.

Note this value is different from JDK's
public static final int
NO_DECOMPOSITION
Decomposition mode value. With NO_DECOMPOSITION set, Strings will not be decomposed for collation. This is the default decomposition setting unless otherwise specified by the locale used to create the Collator.

Note this value is different from the JDK's.
public static final int
CANONICAL_DECOMPOSITION
Decomposition mode value. With CANONICAL_DECOMPOSITION set, characters that are canonical variants according to the Unicode standard will be decomposed for collation.

CANONICAL_DECOMPOSITION corresponds to Normalization Form D as described in Unicode Technical Report #15.
public static final int
RESULT_EQUAL
string a == string b
public static final int
RESULT_GREATER
string a > string b
public static final int
RESULT_LESS
string a < string b
public static final int
RESULT_DEFAULT
accepted by most attributes
Constructors Summary
Methods Summary
public abstract java.lang.Object clone()
Makes a copy of the current object.
return
a copy of this object
stable
ICU 2.4
public abstract int compare(java.lang.String source, java.lang.String target)
The comparison function compares the character data stored in two different strings. Returns information about whether a string is less than, greater than or equal to another string.
Example of use:
. Collator myCollation = Collator.getInstance(Locale::US); . myCollation.setStrength(CollationAttribute.VALUE_PRIMARY); . // result would be CollationAttribute.VALUE_EQUAL . // ("abc" == "ABC") . // (no primary difference between "abc" and "ABC") . int result = myCollation.compare("abc", "ABC",3); . myCollation.setStrength(CollationAttribute.VALUE_TERTIARY); . // result would be Collation.LESS (abc" <<< "ABC") . // (with tertiary difference between "abc" and "ABC") . int result = myCollation.compare("abc", "ABC",3);
param
source source string.
param
target target string.
return
result of the comparison, Collator.RESULT_EQUAL, Collator.RESULT_GREATER or Collator.RESULT_LESS
stable
ICU 2.4
public boolean equals(java.lang.String source, java.lang.String target)
Locale dependent equality check for the argument strings.
param
source string
param
target string
return
true if source is equivalent to target, false otherwise
stable
ICU 2.4
return (compare(source, target) == RESULT_EQUAL);
public abstract boolean equals(java.lang.Object target)
Checks if argument object is equals to this object.
param
target object
return
true if source is equivalent to target, false otherwise
stable
ICU 2.4
public abstract int getAttribute(int type)
Gets the attribute to be used in comparison or transformation.
param
type the attribute to be set from CollationAttribute
return
value attribute value from CollationAttribute
stable
ICU 2.4
public static java.util.Locale[] getAvailableLocales()
String[] locales = NativeCollation.getAvailableLocalesImpl(); Locale[] result = new Locale[locales.length]; String locale; int index, index2; for(int i = 0; i < locales.length; i++) { locale = locales[i]; index = locale.indexOf('_"); index2 = locale.lastIndexOf('_"); if(index == -1) { result[i] = new Locale(locales[i]); } else if(index == 2 && index == index2) { result[i] = new Locale( locale.substring(0,2), locale.substring(3,5)); } else if(index == 2 && index2 > index) { result[i] = new Locale( locale.substring(0,index), locale.substring(index + 1,index2), locale.substring(index2 + 1)); } } return result;
public abstract CollationKey getCollationKey(java.lang.String source)
Get the sort key as an CollationKey object from the argument string. To retrieve sort key in terms of byte arrays, use the method as below
Collator collator = Collator.getInstance(); CollationKey collationkey = collator.getCollationKey("string"); byte[] array = collationkey.toByteArray();
Byte array result are zero-terminated and can be compared using java.util.Arrays.equals();
param
source string to be processed.
return
the sort key
stable
ICU 2.4
public abstract int getDecomposition()
Get the decomposition mode of this Collator.
return
the decomposition mode
see
#CANONICAL_DECOMPOSITION
see
#NO_DECOMPOSITION
stable
ICU 2.4
public static com.ibm.icu4jni.text.Collator getInstance()
Factory method to create an appropriate Collator which uses the default locale collation rules. Current implementation createInstance() returns a RuleBasedCollator(Locale) instance. The RuleBasedCollator will be created in the following order,

Data from argument locale resource bundle if found, otherwise
Data from parent locale resource bundle of arguemtn locale if found, otherwise
Data from built-in default collation rules if found, other
null is returned
return
an instance of Collator
stable
ICU 2.4
// public methods ----------------------------------------------- return getInstance(null);
public static com.ibm.icu4jni.text.Collator getInstance(java.util.Locale locale)
Factory method to create an appropriate Collator which uses the argument locale collation rules.
Current implementation createInstance() returns a RuleBasedCollator(Locale) instance. The RuleBasedCollator will be created in the following order,

Data from argument locale resource bundle if found, otherwise
Data from parent locale resource bundle of arguemtn locale if found, otherwise
Data from built-in default collation rules if found, other
null is returned
param
locale to be used for collation
return
an instance of Collator
stable
ICU 2.4
RuleBasedCollator result = new RuleBasedCollator(locale); return result;
public abstract int getStrength()
Determines the minimum strength that will be use in comparison or transformation.
E.g. with strength == SECONDARY, the tertiary difference is ignored

E.g. with strength == PRIMARY, the secondary and tertiary difference are ignored.
return
the current comparison level.
see
#PRIMARY
see
#SECONDARY
see
#TERTIARY
see
#QUATERNARY
see
#IDENTICAL
stable
ICU 2.4
public abstract int hashCode()
Returns a hash of this collation object
return
hash of this collation object
stable
ICU 2.4
public abstract void setAttribute(int type, int value)
Sets the attribute to be used in comparison or transformation.
Example of use:
. Collator myCollation = Collator.createInstance(Locale::US); . myCollation.setAttribute(CollationAttribute.CASE_LEVEL, . CollationAttribute.VALUE_ON); . int result = myCollation->compare("\\u30C3\\u30CF", . "\\u30C4\\u30CF"); . // result will be Collator.RESULT_LESS.
param
type the attribute to be set from CollationAttribute
param
value attribute value from CollationAttribute
stable
ICU 2.4
public abstract void setDecomposition(int mode)
Set the normalization mode used int this object The normalization mode influences how strings are compared.
param
mode desired normalization mode
see
#CANONICAL_DECOMPOSITION
see
#NO_DECOMPOSITION
stable
ICU 2.4
public abstract void setStrength(int strength)
Sets the minimum strength to be used in comparison or transformation.
Example of use:
. Collator myCollation = Collator.createInstance(Locale::US); . myCollation.setStrength(PRIMARY); . // result will be "abc" == "ABC" . // tertiary differences will be ignored . int result = myCollation->compare("abc", "ABC");
param
strength the new comparison level.
see
#PRIMARY
see
#SECONDARY
see
#TERTIARY
see
#QUATERNARY
see
#IDENTICAL
stable
ICU 2.4