13.4. Choosing Greedy or Nongreedy Matches

Problem

You want your pattern to match the smallest possible string instead of the largest.

Solution

Place a ? after a quantifier to alter that portion of the pattern:

// find all bolded sections
preg_match_all('#<b>.+?</b>#', $html, $matches);

Or, use the U pattern modifier ending to invert all quantifiers from greedy to nongreedy:

// find all bolded sections
preg_match_all('#<b>.+</b>#U', $html, $matches);

Discussion

By default, all regular expressions in PHP are what’s known as greedy. This means a quantifier always tries to match as many characters as possible.

For example, take the pattern p.*, which matches a p and then 0 or more characters, and match it against the string php. A greedy regular expression finds one match, because after it grabs the opening p, it continues on and also matches the hp. A nongreedy regular expression, on the other hand, finds a pair of matches. As before, it matches the p and also the h, but then instead of continuing on, it backs off and leaves the final p uncaptured. A second match then goes ahead and takes the closing letter.

The following code shows that the greedy match finds only one hit; the nongreedy ones find two:

print preg_match_all('/p.*/', "php");  // greedy
print preg_match_all('/p.*?/', "php"); // nongreedy
print preg_match_all('/p.*/U', "php"); // nongreedy
1
               2
               2

Greedy matching is also known as maximal matching and nongreedy matching can be called minimal matching, because these options ...

Get PHP Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.