Text Encodings

At the lowest level, all text files (and strings of individual text within lager files) are just sequences of binary numbers. By knowing about the file’s text encoding , the operating system can convert these numbers into readable text, which it can then display by applying an appropriate font to them.

Because of its capacity to contain all written languages and common symbols that modern humans use (with room to spare), Unicode is considered Mac OS X’s native text encoding. Specifically, Mac OS X supports Unicode’s UTF-16 (fixed 16-bit) and UTF-8 (variable-length) encodings. (See the next section for a brief introduction to Unicode encodings.)

Mac OS X also ships with support for proprietary text encodings; those available on a given Mac depend on the language bundles selected at install time.

Unicode on Mac OS X

After a couple decades of development, the Unicode character-encoding architecture is now being rapidly adopted as a standard by all manner of information technologies, from data format standards such as XML to entire operating systems such as OS X. By its ability to contain all human alphabets, punctuation, ideograms, and other written symbols in a single, very large character set, Unicode makes character encoding relatively simple to implement.

We refer to Unicode as not just a character encoding, but a character-encoding architecture because it encompasses a single character set and several ways to encode the characters within it. The set is simply a ...

Get Mac OS X in a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.