2.1. Philosophical issues: characters and glyphs

Unicode is an encoding of characters, and it is the first encoding that really takes the trouble of defining what a character is.

Let's be frank: computer specialists are not in the habit of worrying about philosophical issues ("who am I?", "what comes after death?", "what is a character?"). But that issue arose quite naturally in Unicode when the Asian languages were touched upon. Unicode purports to be an encoding based on principles, and one of these principles is precisely the fact that it contains characters exclusively. This fact forces us to give serious consideration to the question of what constitutes a character and what does not.

We can compare the relationship between characters and glyphs to the relationship between signifier and signified in linguistics. After all, Ferdinand de Saussure, the founder of linguistics, said himself: "Whether I write in black or white, in incised characters or in relief, with a pen or a chisel—none of that is of any importance for the meaning" [310, p. 118]. What he called "meaning" corresponds very well to what we intend to call "character", namely, the meaning that the author of the document wished to impart by means of the glyph that he used.

But things are a bit more complicated than that: there are characters with no glyphs, glyphs that can correspond to a number of different characters according to context, glyphs that correspond to multiple characters at the same time (with weightings ...

Get Fonts & Encodings now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.