13.8. Escaping Special Characters in a Regular Expression

Problem

You want to have characters such as * or + treated as literals, not as metacharacters, inside a regular expression. This is useful when allowing users to type in search strings you want to use inside a regular expression.

Solution

Use preg_quote( ) to escape Perl-compatible regular-expression metacharacters:

$pattern = preg_quote('The Education of H*Y*M*A*N K*A*P*L*A*N').':(\d+)';
if (preg_match("/$pattern/",$book_rank,$matches)) {
    print "Leo Rosten's book ranked: ".$matches[1];
}

Use quotemeta( ) to escape POSIX metacharacters:

$pattern = quotemeta('M*A*S*H').':[0-9]+';
if (ereg($pattern,$tv_show_rank,$matches)) {
    print 'Radar, Hot Lips, and the gang ranked: '.$matches[1];
}

Discussion

Here are the characters that preg_quote( ) escapes:

. \ + * ? ^ $ [ ] ( ) { } < > = ! | :

Here are the characters that quotemeta( ) escapes:

. \ + * ? ^ $ [ ] ( )

These functions escape the metacharacters with backslash.

The quotemeta( ) function doesn’t match all POSIX metacharacters. The characters {, }, and | are also valid metacharacters but aren’t converted. This is another good reason to use preg_match( ) instead of ereg( ).

You can also pass preg_quote( ) an additional character to escape as a second argument. It’s useful to pass your pattern delimiter (usually /) as this argument so it also gets escaped. This is important if you incorporate user input into a regular-expression pattern. The following code expects $_REQUEST['search_term'] ...

Get PHP Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.