Chapter 15. Unicode

If you do not yet know what Unicode is, you will soon--even if you skip reading this chapter--because working with Unicode is becoming a necessity. (Some people think of it as a necessary evil, but it's really more of a necessary good. In either case, it's a necessary pain.)

Historically, people made up character sets to reflect what they needed to do in the context of their own culture. Since people of all cultures are naturally lazy, they've tended to include only the symbols they needed, excluding the ones they didn't need. That worked fine as long as we were only communicating with other people of our own culture, but now that we're starting to use the Internet for cross-cultural communication, we're running into problems with the exclusive approach. It's hard enough to figure out how to type accented characters on an American keyboard. How in the world (literally) can one write a multilingual web page?

Unicode is the answer, or at least part of the answer (see also XML). Unicode is an inclusive rather than an exclusive character set. While people can and do haggle over the various details of Unicode (and there are plenty of details to haggle over), the overall intent is to make everyone sufficiently happy[1] with Unicode so that they'll willingly use Unicode as the international medium of exchange for textual data. Nobody is forcing you to use Unicode, just as nobody is forcing you to read this chapter (we hope). People will always be allowed to use their ...

Get Programming Perl, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.