IDNpublic final class IDN extends Object Provides methods to convert internationalized domain names (IDNs) between
a normal Unicode representation and an ASCII Compatible Encoding (ACE) representation.
Internationalized domain names can use characters from the entire range of
Unicode, while traditional domain names are restricted to ASCII characters.
ACE is an encoding of Unicode strings that uses only ASCII characters and
can be used with software (such as the Domain Name System) that only
understands traditional domain names.
Internationalized domain names are defined in RFC 3490.
RFC 3490 defines two operations: ToASCII and ToUnicode. These 2 operations employ
Nameprep algorithm, which is a
profile of Stringprep, and
Punycode algorithm to convert
domain name string back and forth.
The behavior of aforementioned conversion process can be adjusted by various flags:
- If the ALLOW_UNASSIGNED flag is used, the domain name string to be converted
can contain code points that are unassigned in Unicode 3.2, which is the
Unicode version on which IDN conversion is based. If the flag is not used,
the presence of such unassigned code points is treated as an error.
- If the USE_STD3_ASCII_RULES flag is used, ASCII strings are checked against RFC 1122 and RFC 1123.
It is an error if they don't meet the requirements.
These flags can be logically OR'ed together.
The security consideration is important with respect to internationalization
domain name support. For example, English domain names may be homographed
- maliciously misspelled by substitution of non-Latin letters.
Unicode Technical Report #36
discusses security issues of IDN support as well as possible solutions.
Applications are responsible for taking adequate security measures when using
international domain names. |
Fields Summary |
---|
public static final int | ALLOW_UNASSIGNEDFlag to allow processing of unassigned code points | public static final int | USE_STD3_ASCII_RULESFlag to turn on the check against STD-3 ASCII rules | private static final String | ACE_PREFIX | private static final int | ACE_PREFIX_LENGTH | private static final int | MAX_LABEL_LENGTH | private static sun.net.idn.StringPrep | namePrep |
Constructors Summary |
---|
private IDN()
InputStream stream = null;
try {
final String IDN_PROFILE = "uidna.spp";
if (System.getSecurityManager() != null) {
stream = AccessController.doPrivileged(new PrivilegedAction<InputStream>() {
public InputStream run() {
return StringPrep.class.getResourceAsStream(IDN_PROFILE);
}
});
} else {
stream = StringPrep.class.getResourceAsStream(IDN_PROFILE);
}
namePrep = new StringPrep(stream);
stream.close();
} catch (IOException e) {
// should never reach here
assert false;
}
|
Methods Summary |
---|
private static boolean | isAllASCII(java.lang.String input)
boolean isASCII = true;
for (int i = 0; i < input.length(); i++) {
int c = input.charAt(i);
if (c > 0x7F) {
isASCII = false;
break;
}
}
return isASCII;
| private static boolean | isLDHChar(int ch)
// high runner case
if(ch > 0x007A){
return false;
}
//['-' '0'..'9' 'A'..'Z' 'a'..'z']
if((ch == 0x002D) ||
(0x0030 <= ch && ch <= 0x0039) ||
(0x0041 <= ch && ch <= 0x005A) ||
(0x0061 <= ch && ch <= 0x007A)
){
return true;
}
return false;
| private static int | searchDots(java.lang.String s, int start)
int i;
for (i = start; i < s.length(); i++) {
char c = s.charAt(i);
if (c == '." || c == '\u3002" || c == '\uFF0E" || c == '\uFF61") {
break;
}
}
return i;
| private static boolean | startsWithACEPrefix(java.lang.StringBuffer input)
boolean startsWithPrefix = true;
if(input.length() < ACE_PREFIX_LENGTH){
return false;
}
for(int i = 0; i < ACE_PREFIX_LENGTH; i++){
if(toASCIILower(input.charAt(i)) != ACE_PREFIX.charAt(i)){
startsWithPrefix = false;
}
}
return startsWithPrefix;
| public static java.lang.String | toASCII(java.lang.String input, int flag)Translates a string from Unicode to ASCII Compatible Encoding (ACE),
as defined by the ToASCII operation of RFC 3490.
ToASCII operation can fail. ToASCII fails if any step of it fails.
If ToASCII operation fails, an IllegalArgumentException will be thrown.
In this case, the input string should not be used in an internationalized domain name.
A label is an individual part of a domain name. The original ToASCII operation,
as defined in RFC 3490, only operates on a single label. This method can handle
both label and entire domain name, by assuming that labels in a domain name are
always separated by dots. The following characters are recognized as dots:
\u002E (full stop), \u3002 (ideographic full stop), \uFF0E (fullwidth full stop),
and \uFF61 (halfwidth ideographic full stop). if dots are
used as label separators, this method also changes all of them to \u002E (full stop)
in output translated string.
int p = 0, q = 0;
StringBuffer out = new StringBuffer();
while (p < input.length()) {
q = searchDots(input, p);
out.append(toASCIIInternal(input.substring(p, q), flag));
p = q + 1;
if (p < input.length()) out.append('.");
}
return out.toString();
| public static java.lang.String | toASCII(java.lang.String input)Translates a string from Unicode to ASCII Compatible Encoding (ACE),
as defined by the ToASCII operation of RFC 3490.
This convenience method works as if by invoking the
two-argument counterpart as follows:
{@link #toASCII(String, int) toASCII}(input, 0);
return toASCII(input, 0);
| private static java.lang.String | toASCIIInternal(java.lang.String label, int flag)
// step 1
// Check if the string contains code points outside the ASCII range 0..0x7c.
boolean isASCII = isAllASCII(label);
StringBuffer dest;
// step 2
// perform the nameprep operation; flag ALLOW_UNASSIGNED is used here
if (!isASCII) {
UCharacterIterator iter = UCharacterIterator.getInstance(label);
try {
dest = namePrep.prepare(iter, flag);
} catch (java.text.ParseException e) {
throw new IllegalArgumentException(e);
}
} else {
dest = new StringBuffer(label);
}
// step 3
// Verify the absence of non-LDH ASCII code points
// 0..0x2c, 0x2e..0x2f, 0x3a..0x40, 0x5b..0x60, 0x7b..0x7f
// Verify the absence of leading and trailing hyphen
boolean useSTD3ASCIIRules = ((flag & USE_STD3_ASCII_RULES) != 0);
if (useSTD3ASCIIRules) {
for (int i = 0; i < dest.length(); i++) {
int c = dest.charAt(i);
if (!isLDHChar(c)) {
throw new IllegalArgumentException("Contains non-LDH characters");
}
}
if (dest.charAt(0) == '-" || dest.charAt(dest.length() - 1) == '-") {
throw new IllegalArgumentException("Has leading or trailing hyphen");
}
}
if (!isASCII) {
// step 4
// If all code points are inside 0..0x7f, skip to step 8
if (!isAllASCII(dest.toString())) {
// step 5
// verify the sequence does not begin with ACE prefix
if(!startsWithACEPrefix(dest)){
// step 6
// encode the sequence with punycode
try {
dest = Punycode.encode(dest, null);
} catch (java.text.ParseException e) {
throw new IllegalArgumentException(e);
}
dest = toASCIILower(dest);
// step 7
// prepend the ACE prefix
dest.insert(0, ACE_PREFIX);
} else {
throw new IllegalArgumentException("The input starts with the ACE Prefix");
}
}
}
// step 8
// the length must be inside 1..63
if(dest.length() > MAX_LABEL_LENGTH){
throw new IllegalArgumentException("The label in the input is too long");
}
return dest.toString();
| private static char | toASCIILower(char ch)
if('A" <= ch && ch <= 'Z"){
return (char)(ch + 'a" - 'A");
}
return ch;
| private static java.lang.StringBuffer | toASCIILower(java.lang.StringBuffer input)
StringBuffer dest = new StringBuffer();
for(int i = 0; i < input.length();i++){
dest.append(toASCIILower(input.charAt(i)));
}
return dest;
| public static java.lang.String | toUnicode(java.lang.String input, int flag)Translates a string from ASCII Compatible Encoding (ACE) to Unicode,
as defined by the ToUnicode operation of RFC 3490.
ToUnicode never fails. In case of any error, the input string is returned unmodified.
A label is an individual part of a domain name. The original ToUnicode operation,
as defined in RFC 3490, only operates on a single label. This method can handle
both label and entire domain name, by assuming that labels in a domain name are
always separated by dots. The following characters are recognized as dots:
\u002E (full stop), \u3002 (ideographic full stop), \uFF0E (fullwidth full stop),
and \uFF61 (halfwidth ideographic full stop).
int p = 0, q = 0;
StringBuffer out = new StringBuffer();
while (p < input.length()) {
q = searchDots(input, p);
out.append(toUnicodeInternal(input.substring(p, q), flag));
p = q + 1;
if (p < input.length()) out.append('.");
}
return out.toString();
| public static java.lang.String | toUnicode(java.lang.String input)Translates a string from ASCII Compatible Encoding (ACE) to Unicode,
as defined by the ToUnicode operation of RFC 3490.
This convenience method works as if by invoking the
two-argument counterpart as follows:
{@link #toUnicode(String, int) toUnicode}(input, 0);
return toUnicode(input, 0);
| private static java.lang.String | toUnicodeInternal(java.lang.String label, int flag)
boolean[] caseFlags = null;
StringBuffer dest;
// step 1
// find out if all the codepoints in input are ASCII
boolean isASCII = isAllASCII(label);
if(!isASCII){
// step 2
// perform the nameprep operation; flag ALLOW_UNASSIGNED is used here
try {
UCharacterIterator iter = UCharacterIterator.getInstance(label);
dest = namePrep.prepare(iter, flag);
} catch (Exception e) {
// toUnicode never fails; if any step fails, return the input string
return label;
}
} else {
dest = new StringBuffer(label);
}
// step 3
// verify ACE Prefix
if(startsWithACEPrefix(dest)) {
// step 4
// Remove the ACE Prefix
String temp = dest.substring(ACE_PREFIX_LENGTH, dest.length());
try {
// step 5
// Decode using punycode
StringBuffer decodeOut = Punycode.decode(new StringBuffer(temp), null);
// step 6
// Apply toASCII
String toASCIIOut = toASCII(decodeOut.toString(), flag);
// step 7
// verify
if (toASCIIOut.equalsIgnoreCase(dest.toString())) {
// step 8
// return output of step 5
return decodeOut.toString();
}
} catch (Exception ignored) {
// no-op
}
}
// just return the input
return label;
|
|