Chapter 6. Unicode Storage and Serialization Formats

As we saw earlier, the Unicode standard comprises a single, unified coded character set containing characters from most of the world's writing systems. The past few chapters—indeed, most of this book—have focused on the coded character set. Unicode also comprises three encoding forms and seven encoding schemes, and a number of other encoding forms and schemes aren't actually part of the Unicode standard but are nevertheless frequently used with it.

The encoding forms and schemes are where the rubber meets the road with Unicode. The coded character set takes each character and places it in a three-dimensional encoding space consisting of seventeen 256 × 256 planes. The position of a character ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.