Canonical Decompositions

Combining character sequences are great for cutting down on encoding space and allowing for representation of combinations of marks you never thought of, but they have some big disadvantages. They take up more space, and they're more difficult to process, requiring more sophisticated display technology, among other things.

For these reasons, Unicode also contains a large number of “precomposed characters,” code point values representing the combination of a base character and one or more non-spacing marks. Many character encoding standards, including the Latin-1 encoding used in most of Europe, use precomposed characters instead of combining character sequences. Users of these encodings know that they need only a single ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.