Using Unicode Text in SVG Images

In 1998, RFC 2277 declared that the Internet was an international phenomenon, and that all new Internet standard protocols, languages, and formats should use Unicode (also referred to as ISO 10646) character set encodings. Sounds great; how do we do that in an SVG image?

Unicode is a standard set of character codes for representing multilingual text. In the early days of computing, vendors invented their own character encodings; it wasn’t until 1968 that the ANSI standards group proposed the US-ASCII specification, which put forth an encoding table that represented all of the Latin alphanumeric characters in a standard 7-bit mapping. In the ’80s, an attempt was made to create an internationalized character set with the ISO-8859-1 standard that provided a table for Latin, Cyrillic, Arabic, Greek, and Hebrew characters. Unicode is the modern synthesis of the standard encodings that came before it, with the goal of adding support for all the world’s languages.

The early versions of Unicode proposed to represent a set of about 65,000 glyphs using 16 bits. The scope of the current version of Unicode has been expanded to potentially encode over a million different character glyphs, including glyphs from historic or ancient languages. More information on Unicode is available at the Unicode Consortium’s web site, http://www.unicode.org/.

The Unicode standard provides three different methods for implementing the encoding:

UTF-8

In UTF-8, character glyphs are represented ...

Get Perl Graphics Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.