13.4. Choosing Greedy or Nongreedy Matches
Problem
You want your pattern to match the smallest possible string instead of the largest.
Solution
Place a ?
after a
quantifier to alter that portion of the
pattern:
// find all bolded sections preg_match_all('#<b>.+?</b>#', $html, $matches);
Or, use the U
pattern modifier ending to invert all
quantifiers from greedy to nongreedy:
// find all bolded sections preg_match_all('#<b>.+</b>#U', $html, $matches);
Discussion
By default, all regular expressions in PHP are what’s known as greedy. This means a quantifier always tries to match as many characters as possible.
For example, take the pattern p.*
, which matches a
p
and then 0 or more characters, and match it
against the string php
. A greedy regular
expression finds one match, because after it grabs the opening
p
, it continues on and also matches the
hp
. A nongreedy regular expression, on the other
hand, finds a pair of matches. As before, it matches the
p
and also the h
, but then
instead of continuing on, it backs off and leaves the final
p
uncaptured. A second match then goes ahead and
takes the closing letter.
The following code shows that the greedy match finds only one hit; the nongreedy ones find two:
print preg_match_all('/p.*/', "php"); // greedy print preg_match_all('/p.*?/', "php"); // nongreedy print preg_match_all('/p.*/U', "php"); // nongreedy 1 2 2
Greedy matching is also known as maximal matching and nongreedy matching can be called minimal matching, because these options ...
Get PHP Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.