Coordinating character sets is only the first part of the challenge. Even languages that share a character set may have different rules for hyphenation, spacing, quotation marks, punctuation, and so on. In addition to character shapes (glyphs), issues such as directionality (whether the text reads left-to-right or right-to-left) and cursive joining behavior have to be taken into account as well.
This prompted a need for a system of language identification. The W3C responded by incorporating into HTML the language tags put forth in the RFC 2070 standard on internationalization.
attribute can be added within any tag to specify the language of the
contained element. It can also be added within the
<html> tag to specify a language for an
entire document. The following example specifies the document’s
language as French:
It can also be used within text elements to switch to other languages within a document; for example, you can “turn on” Norwegian for just one element:
The value for the
lang attribute is a language
code (not the same as a country code). The current HTML and XML
specifications support the two-letter country codes established in
RFC 1766. These are listed in Table 7-1. However, there have been advancements in language identification to include three-letter codes, two-letter codes with country subcode (for example, fr-CA for French as used in Canada), and other ...