Simple Expressions

The simplest regular expression is just a string of text. The regular expression abc matches the string abc. We’re merely searching for a series of characters inside some data. For this case, contains('abc', 'abc') does the same thing.

Regular expressions exist to do far more than search for a literal string; they help us find data that matches a pattern. We can use character sets to specify groups of characters. The regular expression [abc]d matches two characters in which the first character is a, b, or c, followed by the character d.

We can also negate a character set, asking for all the characters not in the character set. To do this, add a caret (^) to the start of the character set. The regular expression [^abc]d matches two characters in which the first character is anything except a, b, or c, followed by d.

Ranges give us a shorthand way of defining character sets. The character set [0-9] specifies the digits 0 through 9, while the character set [a-zA-Z] specifies all of the unaccented letters used in Western European languages. A range can be negated just like any other character set; [^0-9] specifies anything except the digits.

Ranges can also be subtracted from each other. The range [A-Z-[IOQ]] matches any unaccented uppercase letter except I, O, or Q.

Get XSLT, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.