Chapter 5. Using Unicode

Unicode is found everywhere that text occurs. It would be pointless to describe here all the software that processes text in one or another of Unicode's forms. Let us consider instead the chain of data transmission: the author enters his data, which pass through his CPU and through the network to reach the CPU of his reader/interlocutor. This computer displays the information or prints it out in a way that enables the person who received the information to read it.

Let us take these steps one by one. First, there is data entry: how do we go about entering Unicode data? Data can also be converted from other encodings. How do we convert data to Unicode? Next, once the data is in the computer, we must display it. For that purpose we use fonts that must themselves be Unicode-compatible. (We shall discuss fonts in the entire second half of this book, Chapters 6 through 14 and Appendices A through F). Once the data has been revised and corrected, it is transmitted over the network. We have already mentioned MIME and the various encoding forms of Unicode in Chapter 2. At this level, it matters not to HTTP, TCP/IP, and other protocols that the data is encoded in Unicode rather than in some other encoding. Finally, the data reach the recipient. There it must be displayed, and so adequate fonts must be available. The recipient of the message replies, and the entire process begins all over again, in the opposite direction.

What we shall examine in this chapter are ...

Get Fonts & Encodings now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.