Collatorpublic abstract class Collator extends Object implements CloneableAbstract class handling locale specific collation via JNI and ICU.
Subclasses implement specific collation strategies. One subclass,
com.ibm.icu4jni.text.RuleBasedCollator, is currently provided and is
applicable to a wide set of languages. Other subclasses may be created to
handle more specialized needs.
You can use the static factory method, getInstance(), to obtain the
appropriate Collator object for a given locale.
// Compare two strings in the default locale
Collator myCollator = Collator.getInstance();
if (myCollator.compare("abc", "ABC") < 0) {
System.out.println("abc is less than ABC");
}
else {
System.out.println("abc is greater than or equal to ABC");
}
You can set a Collator's strength property to determine the level of
difference considered significant in comparisons.
Five strengths in CollationAttribute are provided: VALUE_PRIMARY,
VALUE_SECONDARY, VALUE_TERTIARY, VALUE_QUARTENARY and VALUE_IDENTICAL.
The exact assignment of strengths to language features is locale dependant.
For example, in Czech, "e" and "f" are considered primary differences, while
"e" and "?" latin small letter e with circumflex are secondary differences,
"e" and "E" are tertiary differences and "e" and "e" are identical.
The following shows how both case and accents could be ignored for US
English.
//Get the Collator for US English and set its strength to PRIMARY
Collator usCollator = Collator.getInstance(Locale.US);
usCollator.setStrength(Collator.PRIMARY);
if (usCollator.compare("abc", "ABC") == 0) {
System.out.println("Strings are equivalent");
}
For comparing Strings exactly once, the compare method provides the best
performance.
When sorting a list of Strings however, it is generally necessary to compare
each String multiple times.
In this case, com.ibm.icu4jni.text.CollationKey provide better performance.
The CollationKey class converts a String to a series of bits that can be
compared bitwise against other CollationKeys.
A CollationKey is created by a Collator object for a given String.
Note: CollationKeys from different Collators can not be compared.
Considerations :
1) ErrorCode not returned to user throw exceptions instead
2) Similar API to java.text.Collator |
Fields Summary |
---|
public static final int | PRIMARYStrongest collator strength value. Typically used to denote differences
between base characters. See class documentation for more explanation. | public static final int | SECONDARYSecond level collator strength value.
Accents in the characters are considered secondary differences.
Other differences between letters can also be considered secondary
differences, depending on the language.
See class documentation for more explanation. | public static final int | TERTIARYThird level collator strength value.
Upper and lower case differences in characters are distinguished at this
strength level. In addition, a variant of a letter differs from the base
form on the tertiary level.
See class documentation for more explanation. | public static final int | QUATERNARYFourth level collator strength value.
When punctuation is ignored
(see Ignoring Punctuations in the user guide) at PRIMARY to TERTIARY
strength, an additional strength level can
be used to distinguish words with and without punctuation.
See class documentation for more explanation. | public static final int | IDENTICAL
Smallest Collator strength value. When all other strengths are equal,
the IDENTICAL strength is used as a tiebreaker. The Unicode code point
values of the NFD form of each string are compared, just in case there
is no difference.
See class documentation for more explanation.
Note this value is different from JDK's
| public static final int | NO_DECOMPOSITION Decomposition mode value. With NO_DECOMPOSITION set, Strings
will not be decomposed for collation. This is the default
decomposition setting unless otherwise specified by the locale
used to create the Collator.
Note this value is different from the JDK's. | public static final int | CANONICAL_DECOMPOSITION Decomposition mode value. With CANONICAL_DECOMPOSITION set,
characters that are canonical variants according to the Unicode standard
will be decomposed for collation.
CANONICAL_DECOMPOSITION corresponds to Normalization Form D as
described in
Unicode Technical Report #15.
| public static final int | RESULT_EQUALstring a == string b | public static final int | RESULT_GREATERstring a > string b | public static final int | RESULT_LESSstring a < string b | public static final int | RESULT_DEFAULTaccepted by most attributes |
Methods Summary |
---|
public abstract java.lang.Object | clone()Makes a copy of the current object.
| public abstract int | compare(java.lang.String source, java.lang.String target)The comparison function compares the character data stored in two
different strings. Returns information about whether a string is less
than, greater than or equal to another string.
Example of use:
. Collator myCollation = Collator.getInstance(Locale::US);
. myCollation.setStrength(CollationAttribute.VALUE_PRIMARY);
. // result would be CollationAttribute.VALUE_EQUAL
. // ("abc" == "ABC")
. // (no primary difference between "abc" and "ABC")
. int result = myCollation.compare("abc", "ABC",3);
. myCollation.setStrength(CollationAttribute.VALUE_TERTIARY);
. // result would be Collation.LESS (abc" <<< "ABC")
. // (with tertiary difference between "abc" and "ABC")
. int result = myCollation.compare("abc", "ABC",3);
| public boolean | equals(java.lang.String source, java.lang.String target)Locale dependent equality check for the argument strings.
return (compare(source, target) == RESULT_EQUAL);
| public abstract boolean | equals(java.lang.Object target)Checks if argument object is equals to this object.
| public abstract int | getAttribute(int type)Gets the attribute to be used in comparison or transformation.
| public static java.util.Locale[] | getAvailableLocales()
String[] locales = NativeCollation.getAvailableLocalesImpl();
Locale[] result = new Locale[locales.length];
String locale;
int index, index2;
for(int i = 0; i < locales.length; i++) {
locale = locales[i];
index = locale.indexOf('_");
index2 = locale.lastIndexOf('_");
if(index == -1) {
result[i] = new Locale(locales[i]);
} else if(index == 2 && index == index2) {
result[i] = new Locale(
locale.substring(0,2),
locale.substring(3,5));
} else if(index == 2 && index2 > index) {
result[i] = new Locale(
locale.substring(0,index),
locale.substring(index + 1,index2),
locale.substring(index2 + 1));
}
}
return result;
| public abstract CollationKey | getCollationKey(java.lang.String source)Get the sort key as an CollationKey object from the argument string.
To retrieve sort key in terms of byte arrays, use the method as below
Collator collator = Collator.getInstance();
CollationKey collationkey = collator.getCollationKey("string");
byte[] array = collationkey.toByteArray();
Byte array result are zero-terminated and can be compared using
java.util.Arrays.equals();
| public abstract int | getDecomposition()Get the decomposition mode of this Collator.
| public static com.ibm.icu4jni.text.Collator | getInstance()Factory method to create an appropriate Collator which uses the default
locale collation rules.
Current implementation createInstance() returns a RuleBasedCollator(Locale)
instance. The RuleBasedCollator will be created in the following order,
- Data from argument locale resource bundle if found, otherwise
- Data from parent locale resource bundle of arguemtn locale if found,
otherwise
- Data from built-in default collation rules if found, other
- null is returned
// public methods -----------------------------------------------
return getInstance(null);
| public static com.ibm.icu4jni.text.Collator | getInstance(java.util.Locale locale)Factory method to create an appropriate Collator which uses the argument
locale collation rules.
Current implementation createInstance() returns a RuleBasedCollator(Locale)
instance. The RuleBasedCollator will be created in the following order,
- Data from argument locale resource bundle if found, otherwise
- Data from parent locale resource bundle of arguemtn locale if found,
otherwise
- Data from built-in default collation rules if found, other
- null is returned
RuleBasedCollator result = new RuleBasedCollator(locale);
return result;
| public abstract int | getStrength()Determines the minimum strength that will be use in comparison or
transformation.
E.g. with strength == SECONDARY, the tertiary difference is ignored
E.g. with strength == PRIMARY, the secondary and tertiary difference
are ignored.
| public abstract int | hashCode()Returns a hash of this collation object
| public abstract void | setAttribute(int type, int value)Sets the attribute to be used in comparison or transformation.
Example of use:
. Collator myCollation = Collator.createInstance(Locale::US);
. myCollation.setAttribute(CollationAttribute.CASE_LEVEL,
. CollationAttribute.VALUE_ON);
. int result = myCollation->compare("\\u30C3\\u30CF",
. "\\u30C4\\u30CF");
. // result will be Collator.RESULT_LESS.
| public abstract void | setDecomposition(int mode)Set the normalization mode used int this object
The normalization mode influences how strings are compared.
| public abstract void | setStrength(int strength)Sets the minimum strength to be used in comparison or transformation.
Example of use:
. Collator myCollation = Collator.createInstance(Locale::US);
. myCollation.setStrength(PRIMARY);
. // result will be "abc" == "ABC"
. // tertiary differences will be ignored
. int result = myCollation->compare("abc", "ABC");
|
|