B. Summary of Regular Expressions

One of the most powerful features of Perl is its regular expression handling. Regular expressions are especially useful for CGI programming, as text manipulation is central to so many CGI applications. In this appendix, we include a quick reference to regular expressions in Perl. For more information on Perl, see the Nutshell Handbooks Learning Perl by Randal L. Schwartz, Programming Perl by Larry Wall and Randal L. Schwartz, and Perl 5 Desktop Reference by Johan Vromans, all published by O'Reilly & Associates, Inc.

/abc/

Matches abc anywhere within the string

/^abc/

Matches abc at the beginning of the string

/abc$/

Matches abc at the end of the string

/a|b/

Matches either a or b Can also be used with words (i.e., /perl|tcl/)

/ab{2,4}c/

Matches an a followed by 2-4 b's, followed by c. If the second number is omitted, such as /ab {2,}c/, the expression will match two or more b's.

/ab*c/

Matches an a followed by zero or more b's, followed by c. Expressions are greedy--it will match as many as possible. Same as /ab{0,}c/.

/ab+c/

Matches an a followed by one or more b's followed by c. Same as /ab{1,}c/.

/ab?c/

Matches an a followed by an optional b followed by c Same as /ab{0,1}c/. This has a different meaning in Perl 5. In Perl 5, the expression: /ab*?c/matches an a followed by as few b's as possible (non-greedy).

/./

Matches any single character except a newline (\n) /p..l / matches a p followed by any two characters, followed by l, so it will match such strings as perl, pall, pdgl, p3gl, etc.

/[abc]/

A character class--matches any one of the three characters listed. A pattern of /[abc]+/ matches strings such as abcab, acbc, abbac, aaa, abcacbac, ccc, etc.

/\d/

Matches a digit. Same as /[0-9]/Multipliers can be used (/\d+/ matches one or more digits)

/\w/

Matches a character classified as a word. Same as /[a-zA-Z0-9_]/

/\s/

Matches a character classified as whitespace. Same as /[ \r\t\n\f]/

/\b/

Matches a word boundary or a backspace/test\b/ matches test, but not testing. However, \b matches a backspace character inside a class (i.e., [\b])

/[^abc]/

Matches a character that is not in the class/[^abc ]+/ will match such strings as hello, test, perl, etc.

/\D/

Matches a character that is not a digit. Same as /[^0-9]/

/\W/

Matches a character that is not a word. Same as /[^a-zA-Z0-9_]/

/\S/

Matches a character that is not whitespace. Same as /[^ \r\t\n\f]/

/\B/

Requires that there is no word boundary/hello\B/ matches hello, but not hello there

/\*/

Matches the * character. Use the \ character to escape characters that have significance in a regular expression.

/(abc)/

Matches abc anywhere within the string, but the parentheses act as memory, storing abc in the variable $1.

Example 1:

/name=(.*)/ will store zero or more characters after name= in variable $1.

Example 2:

/name=(.*)&user=\1/ will store zero or more characters after name= in $1. Then, Perl will replace \1 with the value in $1, and check to see if the pattern matches.

Example 3:

/name=([^&]*)/ will store zero or more characters after name= but before the & character in variable $1.

Example 4:

/name=([^&]+)&age=(.*)$/ will store one or more characters after name= but before & in $1. It then matches the & character. All characters after age= but before the end of the line are stored in $2.

/abc/i

Ignores case. Matches either abc, Abc, ABC, aBc, aBC, etc.

Get CGI Programming on the World Wide Web now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.