ISO Character Sets

Unicode has only recently become commonplace. Previously, the space and processing costs associated with Unicode files caused vendors to prefer smaller, single-byte character sets that could only handle English and a few other languages of interest, but not the full panoply of human language. The International Standards Organization (ISO) has standardized 15 of these character sets as ISO standard 8859. For all of these single-byte character sets, characters 0 through 127 are identical to the ASCII character set, characters 128 through 159 are the C1 controls, and characters 160 through 255 are the additional characters needed for scripts such as Greek, Cyrillic, and Turkish.

ISO-8859-1 (Latin-1)

ASCII plus the accented letters and other characters needed for most Latin-alphabet Western European languages, including Danish, Dutch, Finnish, French, German, Icelandic, Italian, Norwegian, Portuguese, Spanish, and Swedish.

ISO-8859-2 (Latin-2)

ASCII plus the accented letters and other characters needed to write most Latin-alphabet Central and Eastern European languages, including Czech, German, Hungarian, Polish, Romanian, Croatian, Slovak, Slovenian, and Sorbian.

ISO-8859-3 (Latin-3)

ASCII plus the accented letters and other characters needed to write Esperanto, Maltese, and Turkish.

ISO-8859-4 (Latin-4)

ASCII plus the accented letters and other characters needed to write most Baltic languages, including Estonian, Latvian, Lithuanian, Greenlandic, and Lappish. Now ...

Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.