FileDocCategorySizeDatePackage
CharsetDecoder.javaAPI DocAndroid 1.5 API31221Wed May 06 22:41:04 BST 2009java.nio.charset

CharsetDecoder

public abstract class CharsetDecoder extends Object
A converter that can convert a byte sequence from a charset into a 16-bit Unicode character sequence.

The input byte sequence is wrapped by a {@link java.nio.ByteBuffer ByteBuffer} and the output character sequence is a {@link java.nio.CharBuffer CharBuffer}. A decoder instance should be used in the following sequence, which is referred to as a decoding operation:

  1. invoking the {@link #reset() reset} method to reset the decoder if the decoder has been used;
  2. invoking the {@link #decode(ByteBuffer, CharBuffer, boolean) decode} method until the additional input is not needed, the endOfInput parameter must be set to false, the input buffer must be filled and the output buffer must be flushed between invocations;
  3. invoking the {@link #decode(ByteBuffer, CharBuffer, boolean) decode} method for the last time, and then the endOfInput parameter must be set to true;
  4. invoking the {@link #flush(CharBuffer) flush} method to flush the output.

The {@link #decode(ByteBuffer, CharBuffer, boolean) decode} method will convert as many bytes as possible, and the process won't stop until the input bytes have run out, the output buffer has been filled or some error has happened. A {@link CoderResult CoderResult} instance will be returned to indicate the stop reason, and the invoker can identify the result and choose further action, which includes filling the input buffer, flushing the output buffer or recovering from an error and trying again.

There are two common decoding errors. One is named malformed and it is returned when the input byte sequence is illegal for the current specific charset, the other is named unmappable character and it is returned when a problem occurs mapping a legal input byte sequence to its Unicode character equivalent.

Both errors can be handled in three ways, the default one is to report the error to the invoker by a {@link CoderResult CoderResult} instance, and the alternatives are to ignore it or to replace the erroneous input with the replacement string. The replacement string is "\uFFFD" by default and can be changed by invoking {@link #replaceWith(String) replaceWith} method. The invoker of this decoder can choose one way by specifying a {@link CodingErrorAction CodingErrorAction} instance for each error type via {@link #onMalformedInput(CodingErrorAction) onMalformedInput} method and {@link #onUnmappableCharacter(CodingErrorAction) onUnmappableCharacter} method.

This is an abstract class and encapsulates many common operations of the decoding process for all charsets. Decoders for a specific charset should extend this class and need only to implement the {@link #decodeLoop(ByteBuffer, CharBuffer) decodeLoop} method for the basic decoding. If a subclass maintains an internal state, it should override the {@link #implFlush(CharBuffer) implFlush} method and the {@link #implReset() implReset} method in addition.

This class is not thread-safe.

see
java.nio.charset.Charset
see
java.nio.charset.CharsetEncoder
since
Android 1.0

Fields Summary
private static final int
INIT
private static final int
ONGOING
private static final int
END
private static final int
FLUSH
private float
averChars
private float
maxChars
private Charset
cs
private CodingErrorAction
malformAction
private CodingErrorAction
unmapAction
private String
replace
private int
status
Constructors Summary
protected CharsetDecoder(Charset charset, float averageCharsPerByte, float maxCharsPerByte)
Constructs a new CharsetDecoder using the given Charset, average number and maximum number of characters created by this decoder for one input byte, and the default replacement string "\uFFFD".

param
charset the Charset to be used by this decoder.
param
averageCharsPerByte the average number of characters created by this decoder for one input byte, must be positive.
param
maxCharsPerByte the maximum number of characters created by this decoder for one input byte, must be positive.
throws
IllegalArgumentException if averageCharsPerByte or maxCharsPerByte is negative.
since
Android 1.0


    /*
     * --------------------------------------- Constructor
     * ---------------------------------------
     */
                                                                                                                                                                               
        
              
        if (averageCharsPerByte <= 0 || maxCharsPerByte <= 0) {
            // niochar.00=Characters number for one byte must be positive.
            throw new IllegalArgumentException(Messages.getString("niochar.00")); //$NON-NLS-1$
        }
        if (averageCharsPerByte > maxCharsPerByte) {
            // niochar.01=averageCharsPerByte is greater than maxCharsPerByte
            throw new IllegalArgumentException(Messages.getString("niochar.01")); //$NON-NLS-1$
        }
        averChars = averageCharsPerByte;
        maxChars = maxCharsPerByte;
        cs = charset;
        status = INIT;
        malformAction = CodingErrorAction.REPORT;
        unmapAction = CodingErrorAction.REPORT;
        replace = "\ufffd"; //$NON-NLS-1$
    
Methods Summary
private java.nio.CharBufferallocateMore(java.nio.CharBuffer output)

        if (output.capacity() == 0) {
            return CharBuffer.allocate(1);
        }
        CharBuffer result = CharBuffer.allocate(output.capacity() * 2);
        output.flip();
        result.put(output);
        return result;
    
public final floataverageCharsPerByte()
Gets the average number of characters created by this decoder for a single input byte.

return
the average number of characters created by this decoder for a single input byte.
since
Android 1.0

        return averChars;
    
public final java.nio.charset.Charsetcharset()
Gets the Charset which this decoder uses.

return
the Charset which this decoder uses.
since
Android 1.0

        return cs;
    
private voidcheckCoderResult(java.nio.charset.CoderResult result)

        if (result.isMalformed() && malformAction == CodingErrorAction.REPORT) {
            throw new MalformedInputException(result.length());
        } else if (result.isUnmappable()
                && unmapAction == CodingErrorAction.REPORT) {
            throw new UnmappableCharacterException(result.length());
        }
    
public final java.nio.CharBufferdecode(java.nio.ByteBuffer in)
This is a facade method for the decoding operation.

This method decodes the remaining byte sequence of the given byte buffer into a new character buffer. This method performs a complete decoding operation, resets at first, then decodes, and flushes at last.

This method should not be invoked while another {@code decode} operation is ongoing.

param
in the input buffer.
return
a new CharBuffer containing the the characters produced by this decoding operation. The buffer's limit will be the position of the last character in the buffer, and the position will be zero.
throws
IllegalStateException if another decoding operation is ongoing.
throws
MalformedInputException if an illegal input byte sequence for this charset was encountered, and the action for malformed error is {@link CodingErrorAction#REPORT CodingErrorAction.REPORT}
throws
UnmappableCharacterException if a legal but unmappable input byte sequence for this charset was encountered, and the action for unmappable character error is {@link CodingErrorAction#REPORT CodingErrorAction.REPORT}. Unmappable means the byte sequence at the input buffer's current position cannot be mapped to a Unicode character sequence.
throws
CharacterCodingException if another exception happened during the decode operation.
since
Android 1.0

        reset();
        int length = (int) (in.remaining() * averChars);
        CharBuffer output = CharBuffer.allocate(length);
        CoderResult result = null;
        while (true) {
            result = decode(in, output, false);
            checkCoderResult(result);
            if (result.isUnderflow()) {
                break;
            } else if (result.isOverflow()) {
                output = allocateMore(output);
            }
        }
        result = decode(in, output, true);
        checkCoderResult(result);

        while (true) {
            result = flush(output);
            checkCoderResult(result);
            if (result.isOverflow()) {
                output = allocateMore(output);
            } else {
                break;
            }
        }

        output.flip();
        status = FLUSH;
        return output;
    
public final java.nio.charset.CoderResultdecode(java.nio.ByteBuffer in, java.nio.CharBuffer out, boolean endOfInput)
Decodes bytes starting at the current position of the given input buffer, and writes the equivalent character sequence into the given output buffer from its current position.

The buffers' position will be changed with the reading and writing operation, but their limits and marks will be kept intact.

A CoderResult instance will be returned according to following rules:

  • {@link CoderResult#OVERFLOW CoderResult.OVERFLOW} indicates that even though not all of the input has been processed, the buffer the output is being written to has reached its capacity. In the event of this code being returned this method should be called once more with an out argument that has not already been filled.
  • {@link CoderResult#UNDERFLOW CoderResult.UNDERFLOW} indicates that as many bytes as possible in the input buffer have been decoded. If there is no further input and no remaining bytes in the input buffer then this operation may be regarded as complete. Otherwise, this method should be called once more with additional input.
  • A {@link CoderResult#malformedForLength(int) malformed input} result indicates that some malformed input error has been encountered, and the erroneous bytes start at the input buffer's position and their number can be got by result's {@link CoderResult#length() length}. This kind of result can be returned only if the malformed action is {@link CodingErrorAction#REPORT CodingErrorAction.REPORT}.
  • A {@link CoderResult#unmappableForLength(int) unmappable character} result indicates that some unmappable character error has been encountered, and the erroneous bytes start at the input buffer's position and their number can be got by result's {@link CoderResult#length() length}. This kind of result can be returned only if the unmappable character action is {@link CodingErrorAction#REPORT CodingErrorAction.REPORT}.

The endOfInput parameter indicates that the invoker cannot provide further input. This parameter is true if and only if the bytes in current input buffer are all inputs for this decoding operation. Note that it is common and won't cause an error if the invoker sets false and then can't provide more input, while it may cause an error if the invoker always sets true in several consecutive invocations. This would make the remaining input to be treated as malformed input.

This method invokes the {@link #decodeLoop(ByteBuffer, CharBuffer) decodeLoop} method to implement the basic decode logic for a specific charset.

param
in the input buffer.
param
out the output buffer.
param
endOfInput true if all the input characters have been provided.
return
a CoderResult instance which indicates the reason of termination.
throws
IllegalStateException if decoding has started or no more input is needed in this decoding progress.
throws
CoderMalfunctionError if the {@link #decodeLoop(ByteBuffer, CharBuffer) decodeLoop} method threw an BufferUnderflowException or BufferOverflowException.
since
Android 1.0

        /*
         * status check
         */
        if ((status == FLUSH) || (!endOfInput && status == END)) {
            throw new IllegalStateException();
        }

        CoderResult result = null;

        // begin to decode
        while (true) {
            CodingErrorAction action = null;
            try {
                result = decodeLoop(in, out);
            } catch (BufferOverflowException ex) {
                // unexpected exception
                throw new CoderMalfunctionError(ex);
            } catch (BufferUnderflowException ex) {
                // unexpected exception
                throw new CoderMalfunctionError(ex);
            }

            /*
             * result handling
             */
            if (result.isUnderflow()) {
                int remaining = in.remaining();
                status = endOfInput ? END : ONGOING;
                if (endOfInput && remaining > 0) {
                    result = CoderResult.malformedForLength(remaining);
                    in.position(in.position() + result.length());
                } else {
                    return result;
                }
            }
            if (result.isOverflow()) {
                return result;
            }
            // set coding error handle action
            action = malformAction;
            if (result.isUnmappable()) {
                action = unmapAction;
            }
            // If the action is IGNORE or REPLACE, we should continue decoding.
            if (action == CodingErrorAction.REPLACE) {
                if (out.remaining() < replace.length()) {
                    return CoderResult.OVERFLOW;
                }
                out.put(replace);
            } else {
                if (action != CodingErrorAction.IGNORE)
                    return result;
            }
            if (!result.isMalformed()) {
                // Note: the following condition is removed in Harmony revision 518047
                // However, making the conditional statement unconditional
                // leads to misbehavior when using REPLACE on malformedInput.
                in.position(in.position() + result.length());
            }
        }
    
protected abstract java.nio.charset.CoderResultdecodeLoop(java.nio.ByteBuffer in, java.nio.CharBuffer out)
Decodes bytes into characters. This method is called by the {@link #decode(ByteBuffer, CharBuffer, boolean) decode} method.

This method will implement the essential decoding operation, and it won't stop decoding until either all the input bytes are read, the output buffer is filled, or some exception is encountered. Then it will return a CoderResult object indicating the result of current decoding operation. The rules to construct the CoderResult are the same as for {@link #decode(ByteBuffer, CharBuffer, boolean) decode}. When an exception is encountered in the decoding operation, most implementations of this method will return a relevant result object to the {@link #decode(ByteBuffer, CharBuffer, boolean) decode} method, and some performance optimized implementation may handle the exception and implement the error action itself.

The buffers are scanned from their current positions, and their positions will be modified accordingly, while their marks and limits will be intact. At most {@link ByteBuffer#remaining() in.remaining()} characters will be read, and {@link CharBuffer#remaining() out.remaining()} bytes will be written.

Note that some implementations may pre-scan the input buffer and return a CoderResult.UNDERFLOW until it receives sufficient input.

param
in the input buffer.
param
out the output buffer.
return
a CoderResult instance indicating the result.
since
Android 1.0

public java.nio.charset.CharsetdetectedCharset()
Gets the charset detected by this decoder; this method is optional.

If implementing an auto-detecting charset, then this decoder returns the detected charset from this method when it is available. The returned charset will be the same for the rest of the decode operation.

If insufficient bytes have been read to determine the charset, an IllegalStateException will be thrown.

The default implementation always throws UnsupportedOperationException, so it should be overridden by a subclass if needed.

return
the charset detected by this decoder, or null if it is not yet determined.
throws
UnsupportedOperationException if this decoder does not implement an auto-detecting charset.
throws
IllegalStateException if insufficient bytes have been read to determine the charset.
since
Android 1.0

        throw new UnsupportedOperationException();
    
public final java.nio.charset.CoderResultflush(java.nio.CharBuffer out)
Flushes this decoder. This method will call {@link #implFlush(CharBuffer) implFlush}. Some decoders may need to write some characters to the output buffer when they have read all input bytes; subclasses can override {@link #implFlush(CharBuffer) implFlush} to perform the writing operation.

The maximum number of written bytes won't be larger than {@link CharBuffer#remaining() out.remaining()}. If some decoder wants to write more bytes than an output buffer's remaining space allows, then a CoderResult.OVERFLOW will be returned, and this method must be called again with a character buffer that has more remaining space. Otherwise this method will return CoderResult.UNDERFLOW, which means one decoding process has been completed successfully.

During the flush, the output buffer's position will be changed accordingly, while its mark and limit will be intact.

param
out the given output buffer.
return
CoderResult.UNDERFLOW or CoderResult.OVERFLOW.
throws
IllegalStateException if this decoder hasn't read all input bytes during one decoding process, which means neither after calling {@link #decode(ByteBuffer) decode(ByteBuffer)} nor after calling {@link #decode(ByteBuffer, CharBuffer, boolean) decode(ByteBuffer, CharBuffer, boolean)} with true as value for the last boolean parameter.
since
Android 1.0

        if (status != END && status != INIT) {
            throw new IllegalStateException();
        }
        CoderResult result = implFlush(out);
        if (result == CoderResult.UNDERFLOW) {
            status = FLUSH;
        }
        return result;
    
protected java.nio.charset.CoderResultimplFlush(java.nio.CharBuffer out)
Flushes this decoder. The default implementation does nothing and always returns CoderResult.UNDERFLOW; this method can be overridden if needed.

param
out the output buffer.
return
CoderResult.UNDERFLOW or CoderResult.OVERFLOW.
since
Android 1.0

        return CoderResult.UNDERFLOW;
    
protected voidimplOnMalformedInput(java.nio.charset.CodingErrorAction newAction)
Notifies that this decoder's CodingErrorAction specified for malformed input error has been changed. The default implementation does nothing; this method can be overridden if needed.

param
newAction the new action.
since
Android 1.0

        // default implementation is empty
    
protected voidimplOnUnmappableCharacter(java.nio.charset.CodingErrorAction newAction)
Notifies that this decoder's CodingErrorAction specified for unmappable character error has been changed. The default implementation does nothing; this method can be overridden if needed.

param
newAction the new action.
since
Android 1.0

        // default implementation is empty
    
protected voidimplReplaceWith(java.lang.String newReplacement)
Notifies that this decoder's replacement has been changed. The default implementation does nothing; this method can be overridden if needed.

param
newReplacement the new replacement string.
since
Android 1.0

        // default implementation is empty
    
protected voidimplReset()
Reset this decoder's charset related state. The default implementation does nothing; this method can be overridden if needed.

since
Android 1.0

        // default implementation is empty
    
public booleanisAutoDetecting()
Indicates whether this decoder implements an auto-detecting charset.

return
true if this decoder implements an auto-detecting charset.
since
Android 1.0

        return false;
    
public booleanisCharsetDetected()
Indicates whether this decoder has detected a charset; this method is optional.

If this decoder implements an auto-detecting charset, then this method may start to return true during decoding operation to indicate that a charset has been detected in the input bytes and that the charset can be retrieved by invoking the {@link #detectedCharset() detectedCharset} method.

Note that a decoder that implements an auto-detecting charset may still succeed in decoding a portion of the given input even when it is unable to detect the charset. For this reason users should be aware that a false return value does not indicate that no decoding took place.

The default implementation always throws an UnsupportedOperationException; it should be overridden by a subclass if needed.

return
true if this decoder has detected a charset.
throws
UnsupportedOperationException if this decoder doesn't implement an auto-detecting charset.
since
Android 1.0

        throw new UnsupportedOperationException();
    
public java.nio.charset.CodingErrorActionmalformedInputAction()
Gets this decoder's CodingErrorAction when malformed input occurred during the decoding process.

return
this decoder's CodingErrorAction when malformed input occurred during the decoding process.
since
Android 1.0

        return malformAction;
    
public final floatmaxCharsPerByte()
Gets the maximum number of characters which can be created by this decoder for one input byte, must be positive.

return
the maximum number of characters which can be created by this decoder for one input byte, must be positive.
since
Android 1.0

        return maxChars;
    
public final java.nio.charset.CharsetDecoderonMalformedInput(java.nio.charset.CodingErrorAction newAction)
Sets this decoder's action on malformed input errors. This method will call the {@link #implOnMalformedInput(CodingErrorAction) implOnMalformedInput} method with the given new action as argument.

param
newAction the new action on malformed input error.
return
this decoder.
throws
IllegalArgumentException if {@code newAction} is {@code null}.
since
Android 1.0

        if (null == newAction) {
            throw new IllegalArgumentException();
        }
        malformAction = newAction;
        implOnMalformedInput(newAction);
        return this;
    
public final java.nio.charset.CharsetDecoderonUnmappableCharacter(java.nio.charset.CodingErrorAction newAction)
Sets this decoder's action on unmappable character errors. This method will call the {@link #implOnUnmappableCharacter(CodingErrorAction) implOnUnmappableCharacter} method with the given new action as argument.

param
newAction the new action on unmappable character error.
return
this decoder.
throws
IllegalArgumentException if {@code newAction} is {@code null}.
since
Android 1.0

        if (null == newAction) {
            throw new IllegalArgumentException();
        }
        unmapAction = newAction;
        implOnUnmappableCharacter(newAction);
        return this;
    
public final java.nio.charset.CharsetDecoderreplaceWith(java.lang.String newReplacement)
Sets the new replacement string. This method first checks the given replacement's validity, then changes the replacement value, and at last calls the {@link #implReplaceWith(String) implReplaceWith} method with the given new replacement as argument.

param
newReplacement the replacement string, cannot be null or empty. Its length cannot be larger than {@link #maxCharsPerByte()}.
return
this decoder.
throws
IllegalArgumentException if the given replacement cannot satisfy the requirement mentioned above.
since
Android 1.0

        if (null == newReplacement || newReplacement.length() == 0) {
            // niochar.06=Replacement string cannot be null or empty.
            throw new IllegalArgumentException(Messages.getString("niochar.06")); //$NON-NLS-1$
        }
        if (newReplacement.length() > maxChars) {
            // niochar.07=Replacement string's length cannot be larger than max
            // characters per byte.
            throw new IllegalArgumentException(Messages.getString("niochar.07")); //$NON-NLS-1$
        }
        replace = newReplacement;
        implReplaceWith(newReplacement);
        return this;
    
public final java.lang.Stringreplacement()
Gets the replacement string, which is never null or empty.

return
the replacement string, cannot be null or empty.
since
Android 1.0

        return replace;
    
public final java.nio.charset.CharsetDecoderreset()
Resets this decoder. This method will reset the internal status, and then calls implReset() to reset any status related to the specific charset.

return
this decoder.
since
Android 1.0

        status = INIT;
        implReset();
        return this;
    
public java.nio.charset.CodingErrorActionunmappableCharacterAction()
Gets this decoder's CodingErrorAction when an unmappable character error occurred during the decoding process.

return
this decoder's CodingErrorAction when an unmappable character error occurred during the decoding process.
since
Android 1.0

        return unmapAction;