7.2. Taking Language into Account

Choosing a character encoding for your document doesn't specify which language (or languages) your document may be using. Several languages may use the same character set, and if you are using pure Unicode there's no way of knowing if you're speaking Vietnamese or Italian. It would be nice to have the document declare its language to the web server or application, so that readers know immediately whether they will understand it.

7.2.1. The xml:lang Attribute

XML defines the attribute xml:lang as a language label for any element. There is no official action that an XML processor must take when encountering this attribute, but we can imagine some future applications. For example, search engines could be designed to pay attention to the language of a document and use it to categorize its entries. The search interface could then include a menu for languages to include or exclude in a search. Another use for xml:lang might be to combine several versions of a text in one document, each version labeled with a different language. A web browser could be set to ignore all but a particular language, filtering the document so that it displays only what the reader wants. Or, if you're writing a book that includes text in different languages, you could configure your spellchecker to use a different dictionary for each version.

The attribute's value is a string containing a two-letter language code, like so:

xml:lang="en"

The code "en" stands for English. ...

Get Learning XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.