2.8. Match One of Several Alternatives

Problem

Create a regular expression that when applied repeatedly to the text Mary, Jane, and Sue went to Mary's house will match Mary, Jane, Sue, and then Mary again. Further match attempts should fail.

Solution

Mary|Jane|Sue
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

The vertical bar, or pipe symbol, splits the regular expression into multiple alternatives. Mary|Jane|Sue matches Mary, or Jane, or Sue with each match attempt. Only one name matches each time, but a different name can match each time.

All regular expression flavors discussed in this book use a regex-directed engine. The engine is simply the software that makes the regular expression work. Regex-directed[2] means that all possible permutations of the regular expression are attempted at each character position in the subject text, before the regex is attempted at the next character position.

When you apply Mary|Jane|Sue to Mary, Jane, and Sue went to Mary's house, the match Mary is immediately found at the start of the string.

When you apply the same regex to the remainder of the string—e.g., by clicking “Find Next” in your text editor—the regex engine attempts to match Mary at the first comma in the string. That fails. Then, it attempts to match Jane at the same position, which also fails. Attempting to match Sue at the comma fails, too. Only then does the regex engine advance to the next character in the string. Starting at ...

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.