O'Reilly logo

JavaServer Faces by Hans Bergsten

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Dealing with Non-Western Languages

Supporting locales with non-Western languages adds another dimension to the subject of localization—namely, the issue of character encoding. As you probably know, the characters displayed on your screen are really represented by sequences of bits. To know which character to display for a sequence of bits, applications (e.g., a browser) consult a mapping between the bit sequences and the characters they represent. ASCII is an early standard mapping; it maps 7 bits (the numerical values 0 through 127) to the characters in the English alphabet, the numbers 0 through 9, punctuation characters, and some control characters. That was all that was really needed in the early days of computing, because most computers were kept busy crunching numbers.

But as computers were given new tasks, often dealing with human-readable text, 7 bits didn't cut it. Adding one bit made it possible to represent all letters used in the Western European languages, but it was not enough to represent all characters used around the world. This problem was partly solved by defining a number of standards for using eight bits to represent different character subsets. Each of the 10 ISO-8859 standards defines what is called a charset: a mapping between eight bits (a byte) and a character. For instance, ISO-8859-1, also known as Latin-1, defines the subset used for Western European languages such as English, French, Italian, Spanish, German, and Swedish. ISO-8859-1 is the default ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required