Team LiB
Previous Section Next Section

CharsetDecoderjava.nio.charset

Java 1.4

A CharsetDecoder is a "decoding engine" that converts a sequence of bytes into a sequence of characters based on the encoding of some charset. Obtain a CharsetDecoder from the Charset that represents the charset to be decoded. If you have a complete sequence of bytes to be decoded in a ByteBuffer you can pass that buffer to the one-argument version of decode( ). This convenience method decodes the bytes and stores the resulting characters into a newly allocated CharBuffer, resetting and flushing the decoder as necessary. It throws an exception if there are problems with the bytes to be decoded.

Typically, however, the three-argument version of decode( ) is used in a multistep decoding process:

  1. Call the reset( ) method, unless this is the first time the CharsetDecoder has been used.

  2. Call the three-argument version of decode( ) one or more times. The third argument should be true on, and only on, the last invocation of the method. The first argument to decode( ) is a ByteBuffer that contains bytes to be decoded. The second argument is a CharBuffer into which the resulting characters are stored. The return value of the method is a CoderResult object that specifies the state of the ongoing the decoding operation. The possible CoderResult return values are detailed below. In a typical case, however, decode( ) returns after it has decoded all of the bytes in the input buffer. In this case, you would then typically fill the input buffer with more bytes to be decoded, and read characters from the output buffer, calling its compact( ) method to make room for more. If an unexpected problem arises in the CharsetDecoder implementation, decode( ) throws a CoderMalfunctionError.

  3. Pass the output CharBuffer to the flush( ) method to allow any remaining characters to be output.

The decode( ) method returns a CoderResult that indicates the state of the decoding operation. If the return value is CoderResult.UNDERFLOW, then it means that decode( ) returned because all bytes from the input buffer have been read, and more input is required. If the return value is CoderResult.OVERFLOW, then it means that decode( ) returned because the output CharBuffer is full, and no more characters can be decoded into it. Otherwise, the reurn value is a CoderResult whose isError( ) method returns true. There are two basic types of decoding errors. If isMalformed( ) returns true then the input included bytes that are not legal for the charset. These bytes start at the position of the input buffer, and continue for length( ) bytes. Otherwise, if isUnmappable( ) returns true, then the input bytes include a character for which there is no representation in Unicode. The relevant bytes start at the position of the input buffer and continue for length( ) bytes.

By default a CharsetDecoder reports all malformed input and unmappable character errors by returning a CoderResult object as described above. This behavior can be altered, however, by passing a CodingErrorAction to onMalformedInput( ) and onUnmappableCharacter( ). (Query the current action for these types of errors with malformedInputAction( ) and unmappableCharacterAction( ).) CodingErrorAction defines three constants that represent the three possible actions. The default action is REPORT. The action IGNORE tells the CharsetDecoder to ignore (i.e. skip) malformed input and unmappable charaters. The REPLACE action tells the CharsetDecoder to replace malformed input and unmappable characters with the replacement string. This replacement string can be set with replaceWith( ), and can be queried with replacement( ).

averageCharsPerByte( ) and maxCharsPerByte( ) return the average and maximum number of characters that are produced by this decoder per decoded byte. These values can be used to help you choose the size of the CharBuffer to allocate for decoding.

CharsetDecoder is not a thread-safe class. Only one thread should use an instance at a time.

CharsetDecoder is an abstract class. Implementors defining new charsets will need to subclass CharsetDecoder and define the abstract decodeLoop( ) method, which is invoked by decode( ).

public abstract class CharsetDecoder {
// Protected Constructors
     protected CharsetDecoder(Charset cs, 
     float averageCharsPerByte, float maxCharsPerByte);  
// Public Instance Methods
     public final float averageCharsPerByte( );  
     public final Charset charset( );  
     public final java.nio.CharBuffer decode(java.nio.ByteBuffer in) 
        throws CharacterCodingException;  
     public final CoderResult decode(java.nio.ByteBuffer in, java.nio.
        CharBuffer out, boolean endOfInput);  
     public Charset detectedCharset( );  
     public final CoderResult flush(java.nio.CharBuffer out);  
     public boolean isAutoDetecting( );           constant
     public boolean isCharsetDetected( );  
     public CodingErrorAction malformedInputAction( );  
     public final float maxCharsPerByte( );  
     public final CharsetDecoder onMalformedInput(CodingErrorAction newAction);
     public final CharsetDecoder onUnmappableCharacter(CodingErrorAction 
        newAction);  
     public final String replacement( );  
     public final CharsetDecoder replaceWith(String newReplacement);  
     public final CharsetDecoder reset( );  
     public CodingErrorAction unmappableCharacterAction( );  
// Protected Instance Methods
     protected abstract CoderResult decodeLoop(java.
        nio.ByteBuffer in, java.nio.CharBuffer out);  
     protected CoderResult implFlush(java.nio.CharBuffer out);  
     protected void implOnMalformedInput(CodingErrorAction 
        newAction);     empty
     protected void implOnUnmappableCharacter(CodingErrorAction 
        newAction);     empty
     protected void implReplaceWith(String 
        newReplacement);             empty
     protected void implReset( );                                    empty
}

Passed To

java.io.InputStreamReader.InputStreamReader( ), java.nio.channels.Channels.newReader( )

Returned By

Charset.newDecoder( )

    Team LiB
    Previous Section Next Section