5.7. Find Words Near Each Other

Problem

You want to emulate a NEAR search using a regular expression. For readers unfamiliar with the term, some search tools that use Boolean operators such as NOT and OR also have a special operator called NEAR. Searching for “word1 NEAR word2” finds word1 and word2 in any order, as long as they occur within a certain distance of each other.

Solution

If you’re only searching for two different words, you can combine two regular expressions—one that matches word1 before word2, and another that flips the order of the words. The following regex allows up to five words to separate the two you’re searching for:

\b(?:word1\W+(?:\w+\W+){0,5}?word2|word2\W+(?:\w+\W+){0,5}?word1)\b
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
\b(?:
  word1                 # first term
  \W+ (?:\w+\W+){0,5}?  # up to five words
  word2                 # second term
|                       #   or, the same pattern in reverse...
  word2                 # second term
  \W+ (?:\w+\W+){0,5}?  # up to five words
  word1                 # first term
)\b
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

The second regular expression here uses the free-spacing option and adds whitespace and comments for readability. Apart from that, the two regular expressions are identical. JavaScript doesn’t support free-spacing mode, but the other listed regex flavors allow you to take your pick. Recipes 3.5 and 3.7 show examples of how you can add these regular expressions to your search form or other code. ...

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.