5.2. Find Any of Multiple Words

Problem

You want to find any one out of a list of words, without having to search through the subject string multiple times.

Solution

Using alternation

The simple solution is to alternate between the words you want to match:

\b(?:one|two|three)\b
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

More complicated examples of matching similar words are shown in Recipe 5.3.

Example JavaScript solution

var subject = 'One times two plus one equals three.';

var regex = /\b(?:one|two|three)\b/gi;

subject.match(regex);
// returns an array with four matches: ['One','two','one','three']

// This function does the same thing but accepts an array of words to
// match. Any regex metacharacters within the accepted words are escaped
// with a backslash before searching.

function match_words (subject, words) {
    var regex_metachars = /[(){}[\]*+?.\\^$|,\-]/g;

    for (var i = 0; i < words.length; i++) {
        words[i] = words[i].replace(regex_metachars, '\\$&');
    }

    var regex = new RegExp('\\b(?:' + words.join('|') + ')\\b', 'gi');

    return subject.match(regex) || [];
}

match_words(subject, ['one','two','three']);
// returns an array with four matches: ['One','two','one','three']

Discussion

Using alternation

There are three parts to this regular expression: the word boundaries on both ends, the noncapturing group, and the list of words (each separated by the | alternation operator). The word boundaries ensure that the regex does not match part of ...

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.