Unicode Normalization Forms

Of course, an encoding that provides so many alternative ways of representing characters can give rise to text that is much more difficult than necessary to process. In particular, comparing strings for equality is a big challenge when significantly different sequences of bits are supposed to be treated as equal. One way to deal with this problem is to require that text be normalized, or represented in a uniform manner, or to normalize text at some well-defined point so as to simplify operations such as comparing for equality.

Of course, by defining something as the “canonical representation” of a particular idea, you essentially nominate it as the form to which you normalize. In this way, Unicode 1.x and 2.x could ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.