5.4. Find All Except a Specific Word
Problem
You want to use a regular expression to match any complete word except
cat
. Catwoman
and other words that
merely contain the letters “cat” should be matched—just not cat
.
Solution
A negative lookahead can help you rule out specific words, and is key to this next regex:
\b(?!cat\b)\w+
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
Although a negated character class (written as ‹[^⋯]
›) makes it easy to match anything
except a specific character, you can’t just write ‹[^cat]
› to match anything except
the word cat
. ‹[^cat]
› is a valid regex, but it matches any
character except c
, a
, or t
. Hence, although ‹\b[^cat]+\b
› would avoid
matching the word cat
, it wouldn’t match the word
cup
either,
because it contains the forbidden letter c
. The regular expression ‹\b[^c][^a][^t]\w*
› is no good
either, because it would reject any word with c
as its first letter,
a
as its
second letter, or t
as its third. Furthermore, that
doesn’t restrict the first three letters to word characters, and it
only matches words with at least three characters since none of the
negated character classes are optional.
With all that in mind, let’s take another look at how the regular expression shown at the beginning of this recipe solved the problem:
\b # Assert position at a word boundary. (?! # Assert that the regex below cannot be matched starting here... cat # Match "cat". \b # Assert position at a word boundary. ) # ...
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.