General Category

After the code point value and the name, the next most important property that a Unicode character has is its general category. Seven primary categories exist: letter, number, punctuation, symbol, mark, separator, and miscellaneous. Each is subdivided into additional categories.

Letters

The Unicode standard uses the term “letter” rather loosely in assigning things to this general category. Whatever counts as the basic unit of meaning in a particular writing system, whether it represents a phoneme, a syllable, or a whole word or idea, is assigned to the “letter” category. The major exception to this rule comprises marks that combine typographically with other characters, which are categorized as “marks” instead of “letters.” They ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.