Fancy Patterns

Lookaround Assertions

Sometimes you just need to sneak a peek. There are four regex extensions that help you do just that, and we call them lookaround assertions because they let you scout around in a hypothetical sort of way, without committing to matching any characters. What these assertions assert is that some pattern would (or would not) match if we were to try it. The Engine works it all out for us by actually trying to match the hypothetical pattern, and then pretending that it didn't match (if it did).

When the Engine peeks ahead from its current position in the string, we call it a lookahead assertion. If it peeks backward, we call it a lookbehind assertion. The lookahead patterns can be any regular expression, but the lookbehind patterns may only be fixed width, since they have to know where to start the hypothetical match from.

While these four extensions are all zero-width assertions, and hence do not consume characters (at least, not officially), you can in fact capture substrings within them if you supply extra levels of capturing parentheses.

(?=PATTERN) (positive lookahead)

When the Engine encounters (?=PATTERN), it looks ahead in the string to ensure that PATTERN occurs. If you'll recall, in our earlier duplicate word remover, we had to write a loop because the pattern ate too much each time through:

$_ = "Paris in THE THE THE THE spring.";

# remove duplicate words (and triplicate (and quadruplicate…))
1 while s/\b(\w+) \1\b/$1/gi;

Whenever you hear the ...

Get Programming Perl, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.