O'Reilly logo

Java Servlet Programming by Jason Hunter

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Multiple Languages

Now it’s time to push the envelope a little and attempt something that has only recently become possible. Let’s write a servlet that includes several languages on the same page. In a sense, we have already written such a servlet. Our last example, HelloJapan, included both English and Japanese text. It should be observed, however, that this is a special case. Adding English text to a page is almost always possible, due to the convenient fact that nearly all charsets include the 128 U.S.-ASCII characters. In the more general case, when the text on a page contains a mix of languages and none of the previously mentioned charsets contains all the necessary characters, we require an alternate technique.

UCS-2 and UTF-8

The best way to generate a page containing multiple languages is to output 16-bit Unicode characters to the client. There are two common ways to do this: UCS-2 and UTF-8. UCS-2 (Universal Character Set, 2-byte form) sends Unicode characters in what could be called their natural format, two bytes per character. All characters, including US-ASCII characters, require two bytes. UTF-8 (UCS Transformation Format, 8-bit form) is a variable-length encoding. With UTF-8, a Unicode character is transformed into a 1-, 2-, or 3-byte representation. In general, UTF-8 tends to be more efficient than UCS-2 because it can encode a character from the US-ASCII charset using just 1 byte. For this reason, the use of UTF-8 on the Web far exceeds UCS-2. For more information ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required