Chapter 8. More About Regular Expressions

In the previous chapter, we saw the beginnings of what regular expressions can do. Here we’ll see some of their other common features.

Character Classes

A character class, a list of possible characters inside square brackets ([]), matches any single character from within the class. It matches just one single character, but that one character may be any of the ones listed.

For example, the character class [abcwxyz] may match any one of those seven characters. For convenience, you may specify a range of characters with a hyphen (-), so that class may also be written as [a-cw-z]. That didn’t save much typing, but it’s more usual to make a character class like [a-zA-Z], to match any one letter out of that set of 52.[1] You may use the same character shortcuts as in any double-quotish string to define a character, so the class [\000-\177] matches any seven-bit ASCII character.[2]

Of course, a character class will be just part of a full pattern; it will never stand on its own in Perl. For example, you might see code that says something like this:

$_ = "The HAL-9000 requires authorization to continue.";
if (/HAL-[0-9]+/) {
  print "The string mentions some model of HAL computer.\n";
}

Sometimes, it’s easier to specify the characters left out, rather than the ones within the character class. A caret (”^“) at the start of the character class negates it. That is, [^def] will match any single character except one of those three. And [^n\-z] matches ...

Get Learning Perl, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.