FileDocCategorySizeDatePackage
CollationElementTableImpl.javaAPI DocphoneME MR2 API (J2ME)29514Wed May 02 18:00:46 BST 2007com.sun.j2me.global

CollationElementTableImpl

public final class CollationElementTableImpl extends CollationElementTable
An emulator-specific implementation of the CollationElementTable interface.

Fields Summary
private static com.sun.midp.security.SecurityToken
classSecurityToken
This class has a different security domain than the MIDlet suite
private static CollationFile[]
collationFiles
Array of collation table files.
private static CollationElementTableImpl[]
collationTables
Collation table instances for supported locales.
private static String[]
locales
Array of locales for which collation elements exist.
private static int[]
localeToTable
Array for converting from a locale string to the collation table index.
private static Object
loadingMutex
This is used to prevent loading of all collation element tables at once, which is very memory consuming.
private static final int
STATE_UNINITIALIZED
Before loading of the table data.
private static final int
STATE_LOAD_FINISHED
After loading of the table data.
private static final int
STATE_LOAD_FAILED
The table is inconsistent and can't be used.
private static final int
MIN_L2
Min value of the L2 weight value of an encoded collation.
private static final int
MIN_L3
Min value of the L3 weight value of an encoded collation.
private static final int
SEQUENCE_FLAG
The mask of the Sequence flag.
private static final int
OPERATION_FLAG
The mask of the Operation flag.
private static final int
BOOKMARK_OFFSET_MASK
The mask of the Bookmark offset.
private static final int
BOOKMARK_CODEPT_MASK
The mask of the Bookmark code.
private static final int
BOOKMARK_OFFSET_SHIFT
The shift of the Bookmark offset.
private static final int
DATA2_ENTRY_TYPE_FLAG
The mask of the data entry type flag.
private static final int
DATA2_SEQUENCE_FLAG
The mask of the data sequence flag.
private static final int
DA2E0_LOCALE_MASK
The mask of the data locale.
private static final int
DA2E0_OFFSET_MASK
The mask of the data offset.
private static final int
DA2E1_OFFSET_MASK
The mask of the data offset.
private static final int
DA2E1_CODEPT_MASK
The mask of the data code.
private static final int
DA2E0_LOCALE_SHIFT
The shift of the data locale.
private static final int
DA2E1_OFFSET_SHIFT
The shift of the data offset.
private byte[]
offsets0
Array of offsets used in the getCollationElements function.
private short[]
offsets1
Array of offsets used in the getCollationElements function.
private short[]
offsets2
Array of offsets used in the getCollationElements function.
private int[]
data
Array of offsets used in the getCollationData and getCollationElements functions.
private int[]
data2
Array of offsets used in the getCollationDataOffset and getChildBookmark functions.
private final CollationFile
collationFile
The assigned collation data file for this table.
private int
localeIndex
The locale index.
private int
localeFlag
The locale flag.
Constructors Summary
private CollationElementTableImpl(int index, CollationFile file)
Creates a new instance of CollationElementTableImpl for the given locale index and collation file.

param
index the locale index
param
file the CollationFile instance


                                   
         
        localeIndex = index;
        localeFlag = 1 << index;
        collationFile = file;
    
Methods Summary
private static final intcalculateImplicitWeights(int[] buffer, int offset, int cp)
Computes the implicit weights for the given code point and stores them into the buffer on the given index. Returns the number of stored collation elements.

param
buffer the buffer for the collation elements
param
offset the offset into buffer
param
cp the code point
return
the number of calculated collation elements

        int base = 0xfbc0;

        if ((cp >= 0x4e00) && (cp <= 0x9fbf)) {
            // CJK Unified Ideographs
            base = 0xfb40;
        } else if ((cp >= 0x3400) && (cp <= 0x4dbf)) {
            // CJK Unified Ideographs Extension A
            base = 0xfb80;
        } else if ((cp >= 0x20000) && (cp <= 0x2a6df)) {
            // CJK Unified Ideographs Extension B
            base = 0xfb80;
        } // TODO: else if...??

        buffer[offset++] = (MIN_L3 << L3_SHIFT) | 
                (MIN_L2 << L2_SHIFT) |
                (base + (cp >> 15)) & L1_MASK;
        buffer[offset] = ((cp & 0x7fff) | 0x8000) & L1_MASK;

        return 2;
    
public intgetChildBookmark(int bookmark, int cp)
This method can be used to traverse the contractions. The traversing starts when the {@link #getCollationElements} method returns a bookmark instead of collation elements. The returned bookmark, which represents a code point sequence consisting only of one code point, can be further tested if it's extensible by various other code points.

If a partial match is found, the method returns another bookmark which represents the new sequence. The new bookmark can be further "refined" as well. To get the collation elements for the sequence, the sequence has to be terminated by the getChildBookmark(bookmark, TERMINAL_CODE_POINT) call. If the call returns a valid bookmark, it is guaranteed, that the getCollationElements method will return the collation elements for this final bookmark.

If no match can be found for the given bookmark and the code point value, the method returns INVALID_BOOKMARK_VALUE.

param
bookmark the bookmark
param
cp a code point value or TERMINAL_CODE_POINT
return
the new bookmark for the new code point sequence if a match is found or INVALID_BOOKMARK_VALUE if no match can be found in the table
see
#getCollationElements

        if (bookmark == INVALID_BOOKMARK_VALUE) {
            return INVALID_BOOKMARK_VALUE;
        }

        int index = (bookmark & BOOKMARK_OFFSET_MASK) >>> BOOKMARK_OFFSET_SHIFT;
        int value = data2[index];
        int sequenceFlag = value & DATA2_SEQUENCE_FLAG;
        int i = 0;
        if (cp == TERMINAL_CODE_POINT) {
            do {
                if (((value & DATA2_ENTRY_TYPE_FLAG) == 0) &&
                        ((((value & DA2E0_LOCALE_MASK) >>> DA2E0_LOCALE_SHIFT) &
                            localeFlag) != 0)) {
                    // we have found an entry for our locale
                    return bookmark;
                }                
                ++i;
                value = data2[index + i];
            } while ((value & DATA2_SEQUENCE_FLAG) == sequenceFlag);
        } else {
            do {
                if (((value & DATA2_ENTRY_TYPE_FLAG) != 0) &&
                        ((value & DA2E1_CODEPT_MASK) == cp)) {
                    // construct a new bookmark
                    // replace the old offset with a new one
                    bookmark &= ~BOOKMARK_OFFSET_MASK;
                    bookmark |= ((((value & DA2E1_OFFSET_MASK) 
                            >>> DA2E1_OFFSET_SHIFT) + index) 
                            << BOOKMARK_OFFSET_SHIFT) & BOOKMARK_OFFSET_MASK;
                    return bookmark;
                }
                ++i;
                value = data2[index + i];
            } while ((value & DATA2_SEQUENCE_FLAG) == sequenceFlag);
        }

        return INVALID_BOOKMARK_VALUE;
    
private final intgetCollationData(int[] buffer, int offset, int cp, int index)
Stores the collation elements from the given data table index and the code point into the buffer on the given offset. Returns the number of the stored collation elements.

param
buffer the buffer for collation elements
param
offset the offset into buffer
param
cp the code point
param
index the data table index
return
the number of stored elements

       
        int value = data[index];
        int sequenceFlag = value & SEQUENCE_FLAG;
        if ((value & OPERATION_FLAG) != 0) {
            int tmp = (value & L1_MASK) + cp;
            value = (value & ~L1_MASK) | tmp & L1_MASK;
//          value &= ~OPERATION_FLAG;
        }

        if ((data[index + 1] & SEQUENCE_FLAG) != sequenceFlag) {
            return (value | SINGLE_CE_FLAG) & ~BOOKMARK_FLAG;
        }       

        buffer[offset] = value;
        int i = 1;
        value = data[index + 1];
        do {
//          value &= ~SEQUENCE_FLAG;
            if ((value & OPERATION_FLAG) != 0) {
                int tmp = (value & L1_MASK) + cp;
                value = (value & ~L1_MASK) | tmp & L1_MASK;
//              value &= ~OPERATION_FLAG;
            }
            buffer[offset + i++] = value;
            value = data[index + i];
        } while ((value & SEQUENCE_FLAG) == sequenceFlag);

        return i;
    
private final intgetCollationDataOffset(int bookmark)
Returns the data table index for the given bookmark.

param
bookmark the bookmark
return
the data table index

        int index = (bookmark & BOOKMARK_OFFSET_MASK) >>> BOOKMARK_OFFSET_SHIFT;
        int value = data2[index];
        int sequenceFlag = value & DATA2_SEQUENCE_FLAG;
        int i = 0;
        do {
            if (((value & DATA2_ENTRY_TYPE_FLAG) == 0) &&
                ((((value & DA2E0_LOCALE_MASK) >>> DA2E0_LOCALE_SHIFT) &
                    localeFlag) != 0)) {
                    return value & DA2E0_OFFSET_MASK;
            }
            ++i;
            value = data2[index + i];
        } while ((value & DATA2_SEQUENCE_FLAG) == sequenceFlag);

        return -1;
    
public intgetCollationElements(int[] buffer, int offset, int cp)
Returns the collation element/elements for the given code point/points. Each returned collation element is encoded in a single integer value, which can be further decoded by the static methods of this class.

There are three types of possible return value and two types of the input values.

If the parameters are an integer buffer, an offset to this buffer and a single code point, the method can return:

  1. A single encoded collation element value, when the code point decomposes into one collation element and it isn't a starting code point of any contraction. In this case nothing is written into the buffer.
  2. The number of encoded collation elements, when the code point decomposes into more than one collation elements and it isn't a starting code point of any contraction. The encoded collation elements are written to the buffer on the given offset.
  3. A bookmark value, when the given code point is a starting code point of a contraction. Nothing is written into the buffer.

If the parameters are an integer buffer, an offset to this buffer and a bookmark, the method can return:

  1. A single encoded collation element value, when the code point sequence behind the bookmark decomposes into one collation element. Nothing is written into the buffer.
  2. The number of encoded collation elements, when the code point sequence behind the bookmark decomposes into more than one collation elements. The encoded collation elements are written to the buffer on the given offset.
  3. A zero value, when the given bookmark is invalid or it doesn't target the complete (terminated) code point sequence.

param
buffer the array for the decomposition
param
offset the offset from the beginning of the array, where to place the collation elements
param
cp a code point or a bookmark
return
a single encoded collation element or the number of returned collation elements or a bookmark or INVALID_BOOKMARK_VALUE
see
#isBookmark
see
#isSingleCollationEl
see
#getChildBookmark

        if (data == null) {
            initializeData();
        }

        if ((cp & BOOKMARK_FLAG) != 0) {
            // handle the case when cp is a bookmark
            if (cp == INVALID_BOOKMARK_VALUE) {
                return 0;
            }
            int collationOffset = getCollationDataOffset(cp);
            if (collationOffset == -1) {
                return 0;                
            }
            return getCollationData(buffer, offset, cp & BOOKMARK_CODEPT_MASK, 
                    collationOffset);
        }

        int index;

        index = (cp >> 8) & 0x1fff;
        if ((index >= offsets0.length) || (offsets0[index] == -1)) {
            return calculateImplicitWeights(buffer, offset, cp);
        }

        index = (((int)offsets0[index] & 0xff) << 4) + ((cp >> 4) & 0xf);
        if (offsets1[index] == -1) {
            return calculateImplicitWeights(buffer, offset, cp);
        }

        if ((offsets1[index] & 0x8000) != 0) {
            index = (int)offsets1[index] & 0x7fff;
            return getCollationData(buffer, offset, cp, index);
        }

        index = (((int)offsets1[index] & 0xfff) << 4) + (cp & 0xf);
        if (offsets2[index] == -1) {
            return calculateImplicitWeights(buffer, offset, cp);
        }

        index = (int)offsets2[index] & 0xffff;

        if ((index & 0x8000) != 0) {
            return BOOKMARK_FLAG | 
                    (index << BOOKMARK_OFFSET_SHIFT) & BOOKMARK_OFFSET_MASK |
                    cp & BOOKMARK_CODEPT_MASK;
        }

        return getCollationData(buffer, offset, cp, index);
    
public static synchronized com.sun.j2me.global.CollationElementTableImplgetInstance(java.lang.String locale)
Returns an instance of the table for the given locale.

param
locale the locale
return
the instance

        int i;
        for (i = 0; i < locales.length; ++i) {
            if (locales[i].equals(locale)) {
                break;
            }
        }
        
        if (i == locales.length) {
            // not supported
            throw new UnsupportedLocaleException("The locale " + locale + 
                    " is not supported by the string comparator");            
        }
        
        CollationElementTableImpl collationTable = 
                collationTables[localeToTable[i]];
        CollationFile collationFile = collationTable.collationFile;
        
        synchronized (collationFile) {
            if (collationFile.loadingState == STATE_UNINITIALIZED) {
                // Start loading of the data immediately
                new Thread(collationFile).start();
            }
        }

        return collationTable;
    
public intgetMaxContractionLength()
Returns the length of the longest possible contraction in the table.

return
the longest contraction

        return collationFile.maxContraction;
    
public static java.lang.String[]getSupportedLocales()
Gets the locales for which a StringComparator is available in this implementation. If no locales are supported, the returned array must be empty, not null. As the value null is not technically a valid locale, but a special value to trigger the generic collation algorithm, it must not appear in the array.

return
an array of valid microedition.locale values

        // locales without the "empty string" locale 
        String[] filteredLocales = new String[locales.length];
        int filteredCount = 0;
        for (int i = 0; i < locales.length; ++i) {
            if (locales[i].length() != 0) {
                filteredLocales[filteredCount++] = locales[i];
            }
        }
        
        if (filteredCount != locales.length) {
            String[] compactedArray = new String[filteredCount];
            System.arraycopy(filteredLocales, 0, compactedArray, 0, 
                    filteredCount);
            filteredLocales = compactedArray;
        }
        
        return filteredLocales;
    
private voidinitializeData()
Blocks until all table data is loaded from the file.

throws
IllegalStateException if the loading has failed

        synchronized (collationFile) {
            if (collationFile.loadingState != STATE_LOAD_FINISHED) {
                if (collationFile.loadingState == STATE_UNINITIALIZED) {
                    try {
                        collationFile.wait();
                    } catch (InterruptedException e) {
                    }
                }
                if (collationFile.loadingState != STATE_LOAD_FINISHED) {
                    throw new IllegalStateException(
                            "Failed to load the collation element table data");
                }
            }

            offsets0 = collationFile.offsets0;
            offsets1 = collationFile.offsets1;
            offsets2 = collationFile.offsets2;
            data = collationFile.data;
            data2 = collationFile.data2;
        }