Simple Expressions
The simplest regular expression is just a string of text. The
regular expression abc
matches the
string abc
. We’re merely searching
for a series of characters inside some data. For this case, contains('abc', 'abc')
does the same
thing.
Regular expressions exist to do far more than search for a literal
string; they help us find data that matches a pattern. We can use
character sets to specify groups of characters. The regular expression
[abc]d
matches two characters in
which the first character is a
,
b
, or c
, followed by the character d
.
We can also negate a character set, asking for all the characters
not in the character set. To do this, add a caret
(^
) to the start of the character
set. The regular expression [^abc]d
matches two characters in which the first character is anything except
a
, b
, or c
,
followed by d
.
Ranges give us a shorthand way of defining character sets. The
character set [0-9]
specifies the
digits 0 through 9, while the character set [a-zA-Z]
specifies all of the unaccented
letters used in Western European languages. A range can be negated just
like any other character set; [^0-9]
specifies anything except the digits.
Ranges can also be subtracted from each other. The range [A-Z-[IOQ]]
matches any unaccented uppercase
letter except I
, O
, or Q
.
Get XSLT, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.