Creating Custom Collations

The XSLT 2.0 spec uses collations in several places. A collation defines how characters are sorted and compared. English doesn’t have any accented characters or character sequences that sort as separate letters, so that’s not an issue if all your documents are in English. Even if English is your native language, it’s likely you’ll need to work with documents written in other languages. In that case, characters such as the Spanish ch (considered a separate letter, the letter che) or accented characters such as the German umlaut-u, which can be written as ü or ue, become important in sorting and comparing words.

As with extension functions, the XSLT 2.0 spec defines attributes that can be used to indicate where custom collations can be used, but it doesn’t define how to identify a particular piece of code that does the work. Because Saxon has taken the lead in implementing these functions, we’ll focus on accessing custom collations in Saxon here. We’ll look at two of these collations. The first sorts Spanish words so that ch sorts as a separate letter between c and d. The second collation compares German words so that Müller and Mueller are considered identical.

Note

Your author is in no way a speaker of Spanish or German, so please pardon any incorrect statements about the languages themselves. The point here is to illustrate how to create extensions that implement custom collations and then use those extensions for sorting and comparing text in your stylesheets. ...

Get XSLT, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.