Charset

Charset

java.nio.charset

Java 1.4

comparable

A Charset represents a character set or encoding. Each Charset has a cannonical name, returned by name( ), and a set of aliases, returned by aliases( ). You can look up a Charset by name or alias with the static Charset.forName( ) method, which throws an UnsupportedCharsetException if the named charset is not installed on the system. In Java 5.0, you can obtain the default Charset used by the Java VM with the static defaultCharset( ) method. Check whether a charset specified by name or alias is supported with the static isSupported( ). Obtain the complete set of installed charsets with availableCharsets( ) which returns a sorted map from canonical names to Charset objects. Note that charset names are not case-sensitive, and you can use any capitialization for charset names you pass to isSupported( ) and forName( ). Note that there are a number of classes and methods in the Java platform that specify charsets by name rather than by Charset object. See, for example, java.io.InputStreamReader, java.io.OutputStreamWriter, String.getBytes( ), and java.nio.channels.Channels.newWriter( ). When working with classes and methods such as these, there is no need to use a Charset object.

All implementations of Java are required to support at least the following 6 charsets:

Canonical name	Description
US-ASCII	seven-bit ASCII
ISO-8859-1	The 8-bit superset of ASCII which includes the characters used in most Western-European languages. Also known as ISO-LATIN-1.
UTF-8	An 8-bit encoding of Unicode characters that is compatible with US-ASCII.
UTF-16BE	A 16-bit encoding of Unicode characters, using big-endian byte order.
UTF-16LE	A 16-bit encoding of Unicode characters, using little-endian byte order.
UTF-16	A 16-bit encoding of Unicode characters, with byte order specified by a byte order mark character. Assumes big-endian when decoding if there is no byte order mark. Encodes using big-endian byte order and outputs an appropriate byte order mark.

Once you have obtained a Charset with forName( ) or availableCharsets( ), you can use the encode( ) method to encode a String or CharBuffer of text into a ByteBuffer, or you can use the decode( ) method to convert the bytes in a ByteBuffer into characters in a CharBuffer. These convenience methods create a new CharsetEncoder or CharsetDecoder, specify that malformed input or unmappable characters or bytes should be replaced with the default replacement string or bytes, and then invoke the encode( ) or decode( ) method of the encoder or decoder. For full control over the encoding and decoding process, you may prefer to obtain your own CharsetEncoder or CharsetDecoder object with newEncoder( ) or newDecoder( ). See CharsetDecoder for details.

Instead of using a Charset, CharsetEncoder, or CharsetDecoder directly, you may also pass an encoder or decoder to the static methods of java.nio.channels.Channels to obtain a java.io.Reader or java.io.Writer that you can use to read or write characters from or to a byte-oriented Channel.

Note that not all Charset objects support encoding ("auto-detect" charsets can determine the source charset when decoding, but have no way to encode). Use canEncode( ) to determine whether a given Charset can encode.

Charset also defines, implements, or overrides various other methods. displayName( ) returns a localized name for the charset, or returns the cannonical name if there is no localization. toString( ) returns an implementation-dependent textual representation of the charset. The equals( ) method compares two charsets by comparing their canonical names. Charset implements Comparable, and its compareTo( ) method orders charsets by their canonical name. contains( ) returns true if a specified charset is "contained in" this charset. That is, if every character that can be represented in the specified charset can also be represented in this charset. Note that those representations need not be the same, however. isRegistered( ) returns true if the charset is registered with the IANA charset registry (see http://www.iana.org/assignments/character-sets.)

Figure 13-48. java.nio.charset.Charset

public abstract class Charset implements Comparable<Charset> {
// Protected Constructors
     protected Charset(String canonicalName, String[ ] aliases);  
// Public Class Methods
     public static java.util.SortedMap<String,Charset> availableCharsets( );  
5.0  public static Charset defaultCharset( );  
     public static Charset forName(String charsetName);  
     public static boolean isSupported(String charsetName);  
// Public Instance Methods
     public final java.util.Set<String> aliases( );  
     public boolean canEncode( );        constant
     public abstract boolean contains(Charset cs);  
     public final java.nio.CharBuffer decode(java.nio.ByteBuffer bb);  
     public String displayName( );  
     public String displayName(java.util.Locale locale);  
     public final java.nio.ByteBuffer encode(java.nio.CharBuffer cb);  
     public final java.nio.ByteBuffer encode(String str);  
     public final boolean isRegistered( );  
     public final String name( );  
     public abstract CharsetDecoder newDecoder( );  
     public abstract CharsetEncoder newEncoder( );  
// Methods Implementing Comparable
5.0  public final int compareTo(Charset that);  
// Public Methods Overriding Object
     public final boolean equals(Object ob);  
     public final int hashCode( );  
     public final String toString( );  
}

Passed To

java.io.InputStreamReader.InputStreamReader( ), java.io.OutputStreamWriter.OutputStreamWriter( ), CharsetDecoder.CharsetDecoder( ), CharsetEncoder.CharsetEncoder( )

Returned By

CharsetDecoder.{charset( ), detectedCharset( )}, CharsetEncoder.charset( ), java.nio.charset.spi.CharsetProvider.charsetForName( )