Character Class Expressions

Character class expressions, which are enclosed in square brackets, indicate a choice among several characters. These characters can be listed singly, expressed as a range of characters, or expressed as a combination of the two.

Single Characters and Ranges

To specify a choice of several characters, you can simply list them inside square brackets. For example, [def] matches d or e or f. To match multiple occurrences of these letters, you can use a quantifier with a character class expression, as in [def]*, which will match not only defdef, but eddfefd as well. The characters listed can also be any of the escapes described earlier in this chapter. The expression [\p{Ll}\d] matches either a lowercase letter or a digit.

It is also possible to specify a range of characters, by separating the starting and ending characters with a hyphen. For example, [a-z] matches any letter from a to z. The endpoints of the range must be single characters or single character escapes (not a multi-character escapes such as \d).

You can specify more than one range in the same character class expression, which means that it matches a character in any of the ranges. The expression [a-zA-Z0-9] matches one character that is either between a and z, or between A and Z, or a digit from 0 to 9. Unicode code points are used to determine whether a character is in the range.

Ranges and single characters can be combined in any order. For example, [abc0-9] matches either a letter a,

Get XQuery now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.