5.5. Find Any Word Not Followed by a Specific Word
Problem
You want to match any word that is not immediately followed by the word
cat
,
ignoring any whitespace, punctuation, or other nonword characters that
appear in between.
Solution
Negative lookahead is the secret ingredient for this regular expression:
\b\w+\b(?!\W+cat\b)
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Recipes 3.7 and 3.14 show examples of how you might want to implement this regular expression in code.
Discussion
As with many other recipes in this chapter, word boundaries
(‹\b
›) and the word
character token (‹\w
›)
work together to match a complete word. You can find in-depth
descriptions of these features in Recipe 2.6.
The ‹(?!⋯)
› surrounding the
second part of this regex is a negative lookahead. Lookahead tells the regex engine to temporarily step
forward in the string, to check whether the pattern inside the
lookahead can be matched just ahead of the current position. It does
not consume any of the characters matched inside the lookahead.
Instead, it merely asserts whether a match is possible. Since we’re
using a negative lookahead, the result of the assertion is inverted.
In other words, if the pattern inside the lookahead can be matched
just ahead, the match attempt fails, and regex engine moves forward to
try all over again starting from the next character in the subject
string. You can find much more detail about lookahead (and its
counterpart, lookbehind) in Recipe 2.16 ...
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.