International Character Sets and Unicode

The canonical reference to Unicode is The Unicode Standard, Version 2.0 (Addison-Wesley, 1996, ISBN 0-201-48345-9). This book features detailed analysis of the Unicode standard as well as discussion of the difficulties of defining character sets for all the world’s different languages. It’s also got tables of almost all the defined characters in Unicode, including about 20,000 Han ideographs. The size of the book and the large number of interesting tables of different scripts from around the world make it a good choice for a techie coffee-table book that can even amuse your liberal arts friends. Updates, corrections, and errata to that volume are available on the Web at http://www.unicode.org/.

There’s no single source of information for all the different non-Unicode character sets Java readers and writers can translate. However, most of the Windows character sets are enumerated in Developing International Software for Windows 95 and NT, by Nadine Kano (Microsoft Press, 1995, ISBN 1-55615-840-8). Kano ignores non-Windows platforms, and she does occasionally sound too much like a Microsoft press release. Nonetheless, this book contains a lot of useful details about how various localized versions of Windows operate. This book is also available on the MSDN Online Library web site at http://premium.microsoft.com/msdn/library/. Registration is required, but otherwise it’s free. Assuming Microsoft hasn’t added an actually navigable interface to ...

Get Java I/O now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.