MacRoman

The Mac OS uses a different, nonstandard, single-byte character set that’s a superset of ASCII. The version used in the Americas and most of Western Europe is called MacRoman. Variants for other countries include MacGreek, MacHebrew, MacIceland, and so forth. Most Java-based XML processors can make sense out of these encodings if they’re properly labeled, but most other non-Macintosh tools cannot.

For instance, if the French sentence “Au cours des dernières années, XML a été adapte dans des domaines aussi diverse que l’aéronautique, le multimédia, la gestion de hôpitaux, les télécommunications, la théologie, la vente au détail et la littérature médiévale” is written on a Macintosh and then read on a PC, what the PC user will see is “Au cours des derni?res annžes, XML a žtž adapte dans des domaines aussi diverse que l’ažronautique, le multimždia, la gestion de h™pitaux, les tžlžcommunications, la thžologie, la vente au džtail et la littžrature mždižvale,” not the same thing at all. Generally, the result is at least marginally intelligible if most of the text is ASCII, but it certainly doesn’t lend itself to high fidelity or quality. Mac-specific character sets should also be avoided.

Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.