ByteBufferTokenizerpublic abstract class ByteBufferTokenizer extends je3.classes.AbstractTokenizer This is an abstract Tokenizer implementation for tokenizing ByteBuffers.
It implements the two abstract methods of AbstractTokenizer, but defines
two new abstract methods that subclasses must implement. This class
provides byte-to-character decoding but leaves it up to concrete subclasses
to provide the ByteBuffers to decode |
Fields Summary |
---|
CharsetDecoder | decoder | CharBuffer | chars | ByteBuffer | bytes |
Constructors Summary |
---|
protected ByteBufferTokenizer(Charset charset, int charBufferSize)
maximumTokenLength(charBufferSize);
decoder = charset.newDecoder();
decoder.onMalformedInput(CodingErrorAction.IGNORE);
decoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
|
Methods Summary |
---|
protected void | createBuffer(int bufferSize)
// Make sure AbstractTokenizer only calls this method once
assert text == null;
text = new char[bufferSize]; // Create the new buffer.
chars = CharBuffer.wrap(text); // Wrap a char buffer around it.
numChars = 0; // Say how much text it contains.
| protected boolean | fillBuffer()
// Make sure AbstractTokenizer is upholding its end of the bargain
assert text!=null && 0 <= tokenStart && tokenStart <= tokenEnd &&
tokenEnd <= p && p <= numChars && numChars <= text.length;
// First, shift already tokenized characters out of the buffer
if (tokenStart > 0) {
// Shift array contents in the text[] array.
System.arraycopy(text, tokenStart, text, 0, numChars-tokenStart);
// And update buffer indexes. These fields defined in superclass.
tokenEnd -= tokenStart;
p -= tokenStart;
numChars -= tokenStart;
tokenStart = 0;
// Keep the CharBuffer in sync with the changes we made above.
chars.position(p);
}
// If there is still no space in the char buffer, then we've
// encountered a token too large for our buffer size.
// We could try to recover by creating a larger buffer, but
// instead, we just throw an exception
if (chars.remaining() == 0)
throw new IOException("Token too long at " + tokenLine() + ":" +
tokenColumn());
// Get more bytes if we don't have a buffer or if the buffer
// has been emptied
if ((bytes == null || bytes.remaining()==0) && hasMoreBytes())
bytes = getMoreBytes();
// Now that we have room in the chars buffer and data in the bytes
// buffer, we can decode some bytes into chars
CoderResult result = decoder.decode(bytes, chars, !hasMoreBytes());
// Get the index of the last valid character plus one.
numChars = chars.position();
if (result == CoderResult.OVERFLOW) {
// We've filled up the char buffer. It wasn't full before, so
// we know we got at least one new character.
return true;
}
else if (result == CoderResult.UNDERFLOW) {
// This means that we decoded all the bytes and have room left
// in the char buffer. Normally, this is fine. But there is
// a possibility that we didn't actually get any characters.
if (numChars > p) return true;
else { // We didn't get any new characters. Figure out why.
if (!hasMoreBytes()) {
// If there are no more bytes to read, then we're at EOF
return false;
}
else {
// If there are still bytes remaining to read, then
// we probably got part of a multi-byte sequence, and need
// more bytes before we can decode a character from it.
// Try again (recursively) to get some more bytes.
return fillBuffer();
}
}
}
else {
// We used CodingErrorAction.IGNORE for the CharsetDecoder, so
// the decoding result should always be one of the above two.
assert false : "Unexpected CoderResult: " + result;
return false;
}
| protected abstract java.nio.ByteBuffer | getMoreBytes()Get a buffer of bytes for decoding and tokenizing.
Repeated calls to this method may create a new ByteBuffer,
or may refill and return the same buffer each time.
| protected abstract boolean | hasMoreBytes()Determine if more bytes are available.
|
|