O'Reilly logo

Fonts & Encodings by Yannis Haralambous

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

4.1. Decompositions and Normalizations

4.1.1. Combining Characters

We have already discussed the block of combining characters, as well as the category of "marks" and, in particular, the nonspacing marks. But how do these characters work?

The glyph of a combining character interacts with the glyph of a base character. This interaction may take a variety of forms: an acute accent goes over a letter, the cedilla goes underneath, the Hebrew dagesh goes inside the letter, etc.

Some of these diacritical marks are independent of each other: placing a cedilla underneath a letter in no way prevents a circumflex accent from being added as well. Other marks are placed in the same location and thus must appear in a specific order. For example, the Vietnamese language has an '' with a circumflex accent and a tilde, in that order; it would be incorrect to place them the other way around.

All of that suggests two things: first, diacritical marks can be classified in "orthogonal" categories; second, the order of application within a single category is important. Unicode has formalized this approach by defining combining classes.

There are 352 combining characters in Unicode, and they are distributed among 53 combining classes. Among these classes are, first of all, those for signs that are specific to a single writing system (an Arabic vowel over a Thai consonant would have little chance of being ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required