A Historical Note

Much of the terminology surrounding the presentation of Unicode in bits is relatively new, going back only a few years in Unicode's history. Indeed, some of the concepts we discuss here were originally called by different names.

When it was first designed, Unicode was a fixed-length, 16-bit standard. The abstract encoding space was 16 bits wide (a single 256 × 256 plane), and one character encoding form existed—a straightforward mapping of 16-bit code points to 16-bit unsigned integers in memory. The single official encoding scheme prefixed a special sentinel value to the front of the Unicode file to allow systems to auto-detect the underlying byte order of the system that created the file.

Early in the life of the standard, ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.