EncodingInfopublic final class EncodingInfo extends Object Holds information about a given encoding, which is the Java name for the
encoding, the equivalent ISO name.
An object of this type has two useful methods
isInEncoding(char ch);
which can be called if the character is not the high one in
a surrogate pair and:
isInEncoding(char high, char low);
which can be called if the two characters from a high/low surrogate pair.
An EncodingInfo object is a node in a binary search tree. Such a node
will answer if a character is in the encoding, and do so for a given
range of unicode values (m_first to
m_last ). It will handle a certain range of values
explicitly (m_explFirst to m_explLast ).
If the unicode point is before that explicit range, that is it
is in the range m_first <= value < m_explFirst , then it will delegate to another EncodingInfo object for The root
of such a tree, m_before. Likewise for values in the range
m_explLast < value <= m_last , but delgating to m_after
Actually figuring out if a code point is in the encoding is expensive. So the
purpose of this tree is to cache such determinations, and not to build the
entire tree of information at the start, but only build up as much of the
tree as is used during the transformation.
This Class is not a public API, and should only be used internally within
the serializer. |
Fields Summary |
---|
final String | nameThe ISO encoding name. | final String | javaNameThe name used by the Java convertor. | private InEncoding | m_encodingA helper object that we can ask if a
single char, or a surrogate UTF-16 pair
of chars that form a single character,
is in this encoding. |
Constructors Summary |
---|
public EncodingInfo(String name, String javaName)Create an EncodingInfo object based on the ISO name and Java name.
If both parameters are null any character will be considered to
be in the encoding. This is useful for when the serializer is in
temporary output state, and has no assciated encoding.
this.name = name;
this.javaName = javaName;
|
Methods Summary |
---|
private static boolean | inEncoding(char ch, java.lang.String encoding)This is heart of the code that determines if a given character
is in the given encoding. This method is probably expensive,
and the answer should be cached.
This method is not a public API,
and should only be used internally within the serializer.
boolean isInEncoding;
try {
char cArray[] = new char[1];
cArray[0] = ch;
// Construct a String from the char
String s = new String(cArray);
// Encode the String into a sequence of bytes
// using the given, named charset.
byte[] bArray = s.getBytes(encoding);
isInEncoding = inEncoding(ch, bArray);
} catch (Exception e) {
isInEncoding = false;
// If for some reason the encoding is null, e.g.
// for a temporary result tree, we should just
// say that every character is in the encoding.
if (encoding == null)
isInEncoding = true;
}
return isInEncoding;
| private static boolean | inEncoding(char high, char low, java.lang.String encoding)This is heart of the code that determines if a given high/low
surrogate pair forms a character that is in the given encoding.
This method is probably expensive, and the answer should be cached.
This method is not a public API,
and should only be used internally within the serializer.
boolean isInEncoding;
try {
char cArray[] = new char[2];
cArray[0] = high;
cArray[1] = low;
// Construct a String from the char
String s = new String(cArray);
// Encode the String into a sequence of bytes
// using the given, named charset.
byte[] bArray = s.getBytes(encoding);
isInEncoding = inEncoding(high,bArray);
} catch (Exception e) {
isInEncoding = false;
}
return isInEncoding;
| private static boolean | inEncoding(char ch, byte[] data)This method is the core of determining if character
is in the encoding. The method is not foolproof, because
s.getBytes(encoding) has specified behavior only if the
characters are in the specified encoding. However this
method tries it's best.
final boolean isInEncoding;
// If the string written out as data is not in the encoding,
// the output is not specified according to the documentation
// on the String.getBytes(encoding) method,
// but we do our best here.
if (data==null || data.length == 0) {
isInEncoding = false;
}
else {
if (data[0] == 0)
isInEncoding = false;
else if (data[0] == '?" && ch != '?")
isInEncoding = false;
/*
* else if (isJapanese) {
* // isJapanese is really
* // ( "EUC-JP".equals(javaName)
* // || "EUC_JP".equals(javaName)
* // || "SJIS".equals(javaName) )
*
* // Work around some bugs in JRE for Japanese
* if(data[0] == 0x21)
* isInEncoding = false;
* else if (ch == 0xA5)
* isInEncoding = false;
* else
* isInEncoding = true;
* }
*/
else {
// We don't know for sure, but it looks like it is in the encoding
isInEncoding = true;
}
}
return isInEncoding;
| public boolean | isInEncoding(char ch)This is not a public API. It returns true if the
char in question is in the encoding.
if (m_encoding == null) {
m_encoding = new EncodingImpl();
// One could put alternate logic in here to
// instantiate another object that implements the
// InEncoding interface. For example if the JRE is 1.4 or up
// we could have an object that uses JRE 1.4 methods
}
return m_encoding.isInEncoding(ch);
| public boolean | isInEncoding(char high, char low)This is not a public API. It returns true if the
character formed by the high/low pair is in the encoding.
if (m_encoding == null) {
m_encoding = new EncodingImpl();
// One could put alternate logic in here to
// instantiate another object that implements the
// InEncoding interface. For example if the JRE is 1.4 or up
// we could have an object that uses JRE 1.4 methods
}
return m_encoding.isInEncoding(high, low);
|
|