ReaderUTF8public class ReaderUTF8 extends Reader UTF-8 transformed UCS-2 character stream reader.
This reader converts UTF-8 transformed UCS-2 characters to Java characters.
The UCS-2 subset of UTF-8 transformation is described in RFC-2279 #2
"UTF-8 definition":
0000 0000-0000 007F 0xxxxxxx
0000 0080-0000 07FF 110xxxxx 10xxxxxx
0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx
This reader will return incorrect last character on broken UTF-8 stream. |
Fields Summary |
---|
private InputStream | is |
Constructors Summary |
---|
public ReaderUTF8(InputStream is)Constructor.
this.is = is;
|
Methods Summary |
---|
public void | close()Closes the stream.
is.close();
| public int | read(char[] cbuf, int off, int len)Reads characters into a portion of an array.
int num = 0;
int val;
while (num < len) {
if ((val = is.read()) < 0)
return (num != 0)? num: -1;
switch (val & 0xf0) {
case 0xc0:
case 0xd0:
cbuf[off++] = (char)(((val & 0x1f) << 6) | (is.read() & 0x3f));
break;
case 0xe0:
cbuf[off++] = (char)(((val & 0x0f) << 12) |
((is.read() & 0x3f) << 6) | (is.read() & 0x3f));
break;
case 0xf0: // UCS-4 character
throw new UnsupportedEncodingException();
default:
cbuf[off++] = (char)val;
break;
}
num++;
}
return num;
| public int | read()Reads a single character.
int val;
if ((val = is.read()) < 0)
return -1;
switch (val & 0xf0) {
case 0xc0:
case 0xd0:
val = ((val & 0x1f) << 6) | (is.read() & 0x3f);
break;
case 0xe0:
val = ((val & 0x0f) << 12) |
((is.read() & 0x3f) << 6) | (is.read() & 0x3f);
break;
case 0xf0: // UCS-4 character
throw new UnsupportedEncodingException();
default:
break;
}
return val;
|
|