Character Encoding Terminology

Before we move on to talk about more character encoding standards, we need to take a break and define some terms. This effort will make the discussion in the next section easier and help crystallize some of the fuzzy terminology in the sections we just finished.

It's useful to think of the mapping of a sequence of written characters to a sequence of bits in a computer memory or storage device or in a communication link as taking places in a series of stages, or levels, rather than all at once. The Internet Architecture Board (IAB) proposed a three-level encoding model. The Unicode standard, in Unicode Technical Report (UTR) #17, proposes a five-level model, explicitly discussing one level that was merely implied ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.