Character Classes

The [...] construct is used to list a set of characters (a character class) of which one will match. Brackets are often used when capitalization is uncertain in a match:

/[tT]here/

A dash (-) may be used to indicate a range of characters in a character class:

/[a-zA-Z]/;  # Match any single letter
/[0-9]/;     # Match any single digit

To put a literal dash in the list you must use a backslash before it (\-).

By placing a ^ as the first element in the brackets, you create a negated character class, i.e., it matches any character not in the list. For example:

/[^A-Z]/;    # Matches any character other than an uppercase letter

Some common character classes have their own predefined escape sequences for your programming convenience :

Code

Matches

\d

A digit, same as [0-9]

\D

A nondigit, same as [^0-9]

\w

A word character (alphanumeric), same as [a-zA-Z_0-9]

\W

A non-word character, [^a-zA-Z_0-9]

\s

A whitespace character, same as [ \t\n\r\f]

\S

A non-whitespace character, [^ \t\n\r\f]

\C

Match a character (byte)

\pP

Match P-named (Unicode) property

\PP

Match non-P

\X

Match extended unicode sequence

While Perl implements lc() and uc( ), which you can use for testing the proper case of words or characters, you can do the same with escape sequences :

Code

Matches

\l

Lowercase until next character

\u

Uppercase until next character

\L

Lowercase until \E

\U

Uppercase until \E

\Q

Disable pattern metacharacters until \E

\E

End case modification

These elements match ...

Get Perl in a Nutshell, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.