Character Classes

In a pattern match, you may match any character that has--or that does not have--a particular property. There are four ways to specify character classes. You may specify a character classes in the traditional way using square brackets and enumerating the possible characters, or you may use any of three mnemonic shortcuts: the classic Perl classes, the new Perl Unicode properties, or the standard POSIX classes. Each of these shortcuts matches only one character from its set. Quantify them to match larger expanses, such as \d+ to match one or more digits. (An easy mistake is to think that \w matches a word. Use \w+ to match a word.)

Custom Character Classes

An enumerated list of characters in square brackets is called a character class and matches any one of the characters in the list. For example, [aeiouy] matches a letter that can be a vowel in English. (For Welsh add a "w", for Scottish an "r".) To match a right square bracket, either backslash it or place it first in the list.

Character ranges may be indicated using a hyphen and the a-z notation. Multiple ranges may be combined; for example, [0-9a-fA-F] matches one hex "digit". You may use a backslash to protect a hyphen that would otherwise be interpreted as a range delimiter, or just put it at the beginning or end of the class (a practice which is arguably less readable but more traditional).

A caret (or circumflex, or hat, or up arrow) at the front of the character class inverts the class, causing ...

Get Programming Perl, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.