Subcodes
For some purposes, knowing the language is not enough. You also need to know the region where the language is spoken. For instance, French has slightly different vocabulary, spelling, and pronunciation in France, Quebec, Belgium, and Switzerland. Although written identically with an ideographic character set, Mandarin and Cantonese are actually quite different, mutually unintelligible dialects of Chinese. The United States and the United Kingdom are jocularly referred to as “two countries separated by a common language.”
To handle these distinctions, the language code may be
followed by any number of subcodes that further specify the
language. Hyphens separate the language code from the subcode and
subcodes from each other. If the language code is an ISO-639 code,
the first subcode should be one of the two-letter country codes
defined by ISO-3166, “Codes for the Representation of Names of
Countries,” found at http://www.ics.uci.edu/pub/ietf/http/related/iso3166.txt.
This xml:lang
attribute indicates
Canadian French:
<p xml:lang="fr-CA">Marie vient pour le fin de semaine.</p>
The language code is usually written in lowercase, and the country code is written in uppercase. However, this is just a convention, not a requirement.
Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.