O'Reilly logo

Fonts & Encodings by Yannis Haralambous

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

3.1. Basic properties

3.1.1. Name

The name of a character is what we have called its description. The official list of the English names of characters according to their positions within the encoding appears in the following file:

http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

This file contains a large amount of data in a format that is hard for humans to read but easy for computers: fifteen text fields separated by semicolons. Here are a few lines from this file:

   0020;SPACE;Zs;0;WS;;;;;N;;;;;
   0021;EXCLAMATION MARK;Po;0;ON;;;;;N;;;;;
   0022;QUOTATION MARK;Po;0;ON;;;;;N;;;;;
   0023;NUMBER SIGN;Po;0;ET;;;;;N;;;;;
   0024;DOLLAR SIGN;Sc;0;ET;;;;;N;;;;;

The first two fields are the character's position (also called its "code point") and name (which we called its "description" in the previous chapter). These are fields number 0 and 1. (Counting begins at 0.) We shall see the other fields later.

Character names are not there solely for the benefit of humans; programming languages also understand them. In Perl, for example, to obtain the character that represents the letter 'D' of the Cherokee script, we can write \N{CHEROKEE LETTER A}, which is strictly equivalent to \x{13a0}, a reference to the character's code point.

3.1.2. Block and script

These properties refer to the distribution of the full set of characters according to the script to which they belong or to their functional similarity. Thus we have a block of Armenian characters (Armenian), but also a block of pictograms ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required