HTML 4.01 Language Features

Coordinating character sets is only the first part of the challenge. Even languages that share a character set may have different rules for hyphenation, spacing, quotation marks, punctuation, and so on. In addition to character shapes (glyphs), issues such as directionality (whether the text reads left-to-right or right-to-left) and cursive joining behavior have to be taken into account as well.

This prompted a need for a system of language identification. The W3C responded by incorporating into HTML the language tags put forth in the RFC 2070 standard on internationalization.

The lang Attribute

The lang attribute can be added within any tag to specify the language of the contained element. It can also be added within the <html> tag to specify a language for an entire document. The following example specifies the document’s language as French:

<HTML LANG="fr">

It can also be used within text elements to switch to other languages within a document; for example, you can “turn on” Norwegian for just one element:

<BLOCKQUOTE lang="no">...</BLOCKQUOTE>

The value for the lang attribute is a language code (not the same as a country code). The current HTML and XML specifications support the two-letter country codes established in RFC 1766. These are listed in Table 7-1. However, there have been advancements in language identification to include three-letter codes, two-letter codes with country subcode (for example, fr-CA for French as used in Canada), and other ...

Get Web Design in a Nutshell, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.