5.4. Find All Except a Specific Word

Problem

You want to use a regular expression to match any complete word except cat. Catwoman and other words that merely contain the letters “cat” should be matched—just not cat.

Solution

A negative lookahead can help you rule out specific words, and is key to this next regex:

\b(?!cat\b)\w+
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Although a negated character class (written as [^]) makes it easy to match anything except a specific character, you can’t just write [^cat] to match anything except the word cat. [^cat] is a valid regex, but it matches any character except c, a, or t. Hence, although \b[^cat]+\b would avoid matching the word cat, it wouldn’t match the word cup either, because it contains the forbidden letter c. The regular expression \b[^c][^a][^t]\w* is no good either, because it would reject any word with c as its first letter, a as its second letter, or t as its third. Furthermore, that doesn’t restrict the first three letters to word characters, and it only matches words with at least three characters since none of the negated character classes are optional.

With all that in mind, let’s take another look at how the regular expression shown at the beginning of this recipe solved the problem:

\b # Assert position at a word boundary. (?! # Assert that the regex below cannot be matched starting here... cat # Match "cat". \b # Assert position at a word boundary. ) # ...

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.