Internationalization Considerations

XML, through its support for Unicode, is designed to allow for many natural languages. XQuery provides several functions and mechanisms that support multiple natural languages: collations, the normalize-unicode function, and the lang function.

Collations

Collations are used to specify the order in which characters should be compared and sorted. Characters can be sorted simply based on their code points, but this has a number of limitations. Different languages and locales alphabetize the same set of characters differently. In addition, an uppercase letter and its lowercase equivalent may need to be sorted together. For example, if you sort on code points alone, an uppercase A comes after a lowercase z.

Collations are not just for sorting. They can be used to equate two strings that contain equivalent values. Some languages and locales may consider two different characters or sequences of characters to be equivalent. For example, a collation may equate the German character β with the two letters ss. This type of comparison comes into play when using, for example, the contains function, which determines whether one string contains the characters of another string.

Collations in XQuery are identified by a URI. The URI serves only as a name and does not necessarily point to a resource on the Web, although it might. All XQuery implementations support at least one collation, whose name is http://www.w3.org/2005/xpath-functions/collation/codepoint. This ...

Get XQuery now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.