Subcodes

For some purposes, knowing the language is not enough. You also need to know the region where the language is spoken. For instance, French has slightly different vocabulary, spelling, and pronunciation in France, Quebec, Belgium, and Switzerland. Although written identically with an ideographic character set, Mandarin and Cantonese are actually quite different, mutually unintelligible dialects of Chinese. The United States and the United Kingdom are jocularly referred to as “two countries separated by a common language.”

To handle these distinctions, the language code may be followed by any number of subcodes that further specify the language. Hyphens separate the language code from the subcode and subcodes from each other. If the language code is an ISO-639 code, the first subcode should be one of the two-letter country codes defined by ISO-3166, “Codes for the Representation of Names of Countries,” found at http://www.ics.uci.edu/pub/ietf/http/related/iso3166.txt. This xml:lang attribute indicates Canadian French:

<p xml:lang="fr-CA">Marie vient pour le fin de semaine.</p>

The language code is usually written in lowercase, and the country code is written in uppercase. However, this is just a convention, not a requirement.

Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.