4.1. Decompositions and Normalizations

4.1.1. Combining Characters

We have already discussed the block of combining characters, as well as the category of "marks" and, in particular, the nonspacing marks. But how do these characters work?

The glyph of a combining character interacts with the glyph of a base character. This interaction may take a variety of forms: an acute accent goes over a letter, the cedilla goes underneath, the Hebrew dagesh goes inside the letter, etc.

Some of these diacritical marks are independent of each other: placing a cedilla underneath a letter in no way prevents a circumflex accent from being added as well. Other marks are placed in the same location and thus must appear in a specific order. For example, the Vietnamese language has an '' with a circumflex accent and a tilde, in that order; it would be incorrect to place them the other way around.

All of that suggests two things: first, diacritical marks can be classified in "orthogonal" categories; second, the order of application within a single category is important. Unicode has formalized this approach by defining combining classes.

There are 352 combining characters in Unicode, and they are distributed among 53 combining classes. Among these classes are, first of all, those for signs that are specific to a single writing system (an Arabic vowel over a Thai consonant would have little chance of being ...

Get Fonts & Encodings now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.