UCSReaderpublic class UCSReader extends Reader Reader for UCS-2 and UCS-4 encodings.
(i.e., encodings from ISO-10646-UCS-(2|4)). |
Fields Summary |
---|
public static final int | DEFAULT_BUFFER_SIZEDefault byte buffer size (8192, larger than that of ASCIIReader
since it's reasonable to surmise that the average UCS-4-encoded
file should be 4 times as large as the average ASCII-encoded file). | public static final short | UCS2LE | public static final short | UCS2BE | public static final short | UCS4LE | public static final short | UCS4BE | protected InputStream | fInputStreamInput stream. | protected byte[] | fBufferByte buffer. | protected short | fEncoding |
Constructors Summary |
---|
public UCSReader(InputStream inputStream, short encoding)Constructs an ASCII reader from the specified input stream
using the default buffer size. The Endian-ness and whether this is
UCS-2 or UCS-4 needs also to be known in advance.
//
// Constructors
//
this(inputStream, DEFAULT_BUFFER_SIZE, encoding);
| public UCSReader(InputStream inputStream, int size, short encoding)Constructs an ASCII reader from the specified input stream
and buffer size. The Endian-ness and whether this is
UCS-2 or UCS-4 needs also to be known in advance.
fInputStream = inputStream;
fBuffer = new byte[size];
fEncoding = encoding;
|
Methods Summary |
---|
public void | close()Close the stream. Once a stream has been closed, further read(),
ready(), mark(), or reset() invocations will throw an IOException.
Closing a previously-closed stream, however, has no effect.
fInputStream.close();
| public void | mark(int readAheadLimit)Mark the present position in the stream. Subsequent calls to reset()
will attempt to reposition the stream to this point. Not all
character-input streams support the mark() operation.
fInputStream.mark(readAheadLimit);
| public boolean | markSupported()Tell whether this stream supports the mark() operation.
return fInputStream.markSupported();
| public int | read()Read a single character. This method will block until a character is
available, an I/O error occurs, or the end of the stream is reached.
Subclasses that intend to support efficient single-character input
should override this method.
int b0 = fInputStream.read() & 0xff;
if (b0 == 0xff)
return -1;
int b1 = fInputStream.read() & 0xff;
if (b1 == 0xff)
return -1;
if(fEncoding >=4) {
int b2 = fInputStream.read() & 0xff;
if (b2 == 0xff)
return -1;
int b3 = fInputStream.read() & 0xff;
if (b3 == 0xff)
return -1;
System.err.println("b0 is " + (b0 & 0xff) + " b1 " + (b1 & 0xff) + " b2 " + (b2 & 0xff) + " b3 " + (b3 & 0xff));
if (fEncoding == UCS4BE)
return (b0<<24)+(b1<<16)+(b2<<8)+b3;
else
return (b3<<24)+(b2<<16)+(b1<<8)+b0;
} else { // UCS-2
if (fEncoding == UCS2BE)
return (b0<<8)+b1;
else
return (b1<<8)+b0;
}
| public int | read(char[] ch, int offset, int length)Read characters into a portion of an array. This method will block
until some input is available, an I/O error occurs, or the end of the
stream is reached.
int byteLength = length << ((fEncoding >= 4)?2:1);
if (byteLength > fBuffer.length) {
byteLength = fBuffer.length;
}
int count = fInputStream.read(fBuffer, 0, byteLength);
if(count == -1) return -1;
// try and make count be a multiple of the number of bytes we're looking for
if(fEncoding >= 4) { // BigEndian
// this looks ugly, but it avoids an if at any rate...
int numToRead = (4 - (count & 3) & 3);
for(int i=0; i<numToRead; i++) {
int charRead = fInputStream.read();
if(charRead == -1) { // end of input; something likely went wrong!A Pad buffer with nulls.
for (int j = i;j<numToRead; j++)
fBuffer[count+j] = 0;
break;
} else {
fBuffer[count+i] = (byte)charRead;
}
}
count += numToRead;
} else {
int numToRead = count & 1;
if(numToRead != 0) {
count++;
int charRead = fInputStream.read();
if(charRead == -1) { // end of input; something likely went wrong!A Pad buffer with nulls.
fBuffer[count] = 0;
} else {
fBuffer[count] = (byte)charRead;
}
}
}
// now count is a multiple of the right number of bytes
int numChars = count >> ((fEncoding >= 4)?2:1);
int curPos = 0;
for (int i = 0; i < numChars; i++) {
int b0 = fBuffer[curPos++] & 0xff;
int b1 = fBuffer[curPos++] & 0xff;
if(fEncoding >=4) {
int b2 = fBuffer[curPos++] & 0xff;
int b3 = fBuffer[curPos++] & 0xff;
if (fEncoding == UCS4BE)
ch[offset+i] = (char)((b0<<24)+(b1<<16)+(b2<<8)+b3);
else
ch[offset+i] = (char)((b3<<24)+(b2<<16)+(b1<<8)+b0);
} else { // UCS-2
if (fEncoding == UCS2BE)
ch[offset+i] = (char)((b0<<8)+b1);
else
ch[offset+i] = (char)((b1<<8)+b0);
}
}
return numChars;
| public boolean | ready()Tell whether this stream is ready to be read.
return false;
| public void | reset()Reset the stream. If the stream has been marked, then attempt to
reposition it at the mark. If the stream has not been marked, then
attempt to reset it in some way appropriate to the particular stream,
for example by repositioning it to its starting point. Not all
character-input streams support the reset() operation, and some support
reset() without supporting mark().
fInputStream.reset();
| public long | skip(long n)Skip characters. This method will block until some characters are
available, an I/O error occurs, or the end of the stream is reached.
// charWidth will represent the number of bits to move
// n leftward to get num of bytes to skip, and then move the result rightward
// to get num of chars effectively skipped.
// The trick with &'ing, as with elsewhere in this dcode, is
// intended to avoid an expensive use of / that might not be optimized
// away.
int charWidth = (fEncoding >=4)?2:1;
long bytesSkipped = fInputStream.skip(n<<charWidth);
if((bytesSkipped & (charWidth | 1)) == 0) return bytesSkipped >> charWidth;
return (bytesSkipped >> charWidth) + 1;
|
|