Now it’s time to push the envelope a little and attempt
something that has only recently become possible. Let’s write a
servlet that includes
several languages on the same page. In a
sense, we have already written such a servlet. Our last example,
HelloJapan, included both English and Japanese
text. It should be observed, however, that this is a special case.
Adding English text to a page is almost always possible, due to the
convenient fact that nearly all
include the 128 U.S.-ASCII characters. In the more general case, when
the text on a page contains a mix of languages and none of the
previously mentioned charsets contains all the necessary characters,
we require an alternate technique.
The best way to generate a page containing multiple languages is to output 16-bit Unicode characters to the client. There are two common ways to do this: UCS-2 and UTF-8. UCS-2 (Universal Character Set, 2-byte form) sends Unicode characters in what could be called their natural format, two bytes per character. All characters, including US-ASCII characters, require two bytes. UTF-8 (UCS Transformation Format, 8-bit form) is a variable-length encoding. With UTF-8, a Unicode character is transformed into a 1-, 2-, or 3-byte representation. In general, UTF-8 tends to be more efficient than UCS-2 because it can encode a character from the US-ASCII charset using just 1 byte. For this reason, the use of UTF-8 on the Web far exceeds UCS-2. For more information ...