FileDocCategorySizeDatePackage
EncodingInfo.javaAPI DocJava SE 6 API19518Tue Jun 10 00:23:06 BST 2008com.sun.org.apache.xml.internal.serializer

EncodingInfo

public final class EncodingInfo extends Object
Holds information about a given encoding, which is the Java name for the encoding, the equivalent ISO name.

An object of this type has two useful methods

isInEncoding(char ch);
which can be called if the character is not the high one in a surrogate pair and:
isInEncoding(char high, char low);
which can be called if the two characters from a high/low surrogate pair.

An EncodingInfo object is a node in a binary search tree. Such a node will answer if a character is in the encoding, and do so for a given range of unicode values (m_first to m_last). It will handle a certain range of values explicitly (m_explFirst to m_explLast). If the unicode point is before that explicit range, that is it is in the range m_first <= value < m_explFirst, then it will delegate to another EncodingInfo object for The root of such a tree, m_before. Likewise for values in the range m_explLast < value <= m_last, but delgating to m_after

Actually figuring out if a code point is in the encoding is expensive. So the purpose of this tree is to cache such determinations, and not to build the entire tree of information at the start, but only build up as much of the tree as is used during the transformation.

This Class is not a public API, and should only be used internally within the serializer.

xsl.usage
internal

Fields Summary
final String
name
The ISO encoding name.
final String
javaName
The name used by the Java convertor.
private InEncoding
m_encoding
A helper object that we can ask if a single char, or a surrogate UTF-16 pair of chars that form a single character, is in this encoding.
Constructors Summary
public EncodingInfo(String name, String javaName)
Create an EncodingInfo object based on the ISO name and Java name. If both parameters are null any character will be considered to be in the encoding. This is useful for when the serializer is in temporary output state, and has no assciated encoding.

param
name reference to the ISO name.
param
javaName reference to the Java encoding name.


        this.name = name;
        this.javaName = javaName;
    
Methods Summary
private static booleaninEncoding(char ch, java.lang.String encoding)
This is heart of the code that determines if a given character is in the given encoding. This method is probably expensive, and the answer should be cached.

This method is not a public API, and should only be used internally within the serializer.

param
ch the char in question, that is not a high char of a high/low surrogate pair.
param
encoding the Java name of the enocding.
xsl.usage
internal

        boolean isInEncoding;
        try {
            char cArray[] = new char[1];
            cArray[0] = ch;
            // Construct a String from the char 
            String s = new String(cArray);
            // Encode the String into a sequence of bytes 
            // using the given, named charset. 
            byte[] bArray = s.getBytes(encoding);
            isInEncoding = inEncoding(ch, bArray);

        } catch (Exception e) {
            isInEncoding = false;
            
            // If for some reason the encoding is null, e.g.
            // for a temporary result tree, we should just
            // say that every character is in the encoding.
            if (encoding == null)
            	isInEncoding = true;
        }
        return isInEncoding;
    
private static booleaninEncoding(char high, char low, java.lang.String encoding)
This is heart of the code that determines if a given high/low surrogate pair forms a character that is in the given encoding. This method is probably expensive, and the answer should be cached.

This method is not a public API, and should only be used internally within the serializer.

param
high the high char of a high/low surrogate pair.
param
low the low char of a high/low surrogate pair.
param
encoding the Java name of the encoding.
xsl.usage
internal

        boolean isInEncoding;
        try {
            char cArray[] = new char[2];
            cArray[0] = high;
            cArray[1] = low;
            // Construct a String from the char 
            String s = new String(cArray);
            // Encode the String into a sequence of bytes 
            // using the given, named charset. 
            byte[] bArray = s.getBytes(encoding);
            isInEncoding = inEncoding(high,bArray);
        } catch (Exception e) {
            isInEncoding = false;
        }
        
        return isInEncoding;
    
private static booleaninEncoding(char ch, byte[] data)
This method is the core of determining if character is in the encoding. The method is not foolproof, because s.getBytes(encoding) has specified behavior only if the characters are in the specified encoding. However this method tries it's best.

param
ch the char that was converted using getBytes, or the first char of a high/low pair that was converted.
param
data the bytes written out by the call to s.getBytes(encoding);
return
true if the character is in the encoding.

        final boolean isInEncoding;
        // If the string written out as data is not in the encoding,
        // the output is not specified according to the documentation
        // on the String.getBytes(encoding) method,
        // but we do our best here.        
        if (data==null || data.length == 0) {
            isInEncoding = false;
        }
        else {
            if (data[0] == 0)
                isInEncoding = false;
            else if (data[0] == '?" && ch != '?")
                isInEncoding = false;
            /*
             * else if (isJapanese) {
             *   // isJapanese is really 
             *   //   (    "EUC-JP".equals(javaName) 
             *   //    ||  "EUC_JP".equals(javaName)
             *  //     ||  "SJIS".equals(javaName)   )
             * 
             *   // Work around some bugs in JRE for Japanese
             *   if(data[0] == 0x21)
             *     isInEncoding = false;
             *   else if (ch == 0xA5)
             *     isInEncoding = false;
             *   else
             *     isInEncoding = true;
             * }
             */ 
                
            else {
                // We don't know for sure, but it looks like it is in the encoding
                isInEncoding = true; 
            }
        }
        return isInEncoding;
    
public booleanisInEncoding(char ch)
This is not a public API. It returns true if the char in question is in the encoding.

param
ch the char in question.
xsl.usage
internal

        if (m_encoding == null) {
            m_encoding = new EncodingImpl();
            
            // One could put alternate logic in here to
            // instantiate another object that implements the
            // InEncoding interface. For example if the JRE is 1.4 or up
            // we could have an object that uses JRE 1.4 methods
        }
        return m_encoding.isInEncoding(ch); 
    
public booleanisInEncoding(char high, char low)
This is not a public API. It returns true if the character formed by the high/low pair is in the encoding.

param
high a char that the a high char of a high/low surrogate pair.
param
low a char that is the low char of a high/low surrogate pair.
xsl.usage
internal

        if (m_encoding == null) {
            m_encoding = new EncodingImpl();
            
            // One could put alternate logic in here to
            // instantiate another object that implements the
            // InEncoding interface. For example if the JRE is 1.4 or up
            // we could have an object that uses JRE 1.4 methods
        }
        return m_encoding.isInEncoding(high, low);