Keywords
Problem
You are working with a file format for forms in a software application. The words “end,” “in,” “inline,” “inherited,” “item,” and “object” are reserved keywords in this format.[9] You want a regular expression that matches any of these keywords.
Solution
The basic solution is very straightforward and works with all regex flavors in this book:
\b(?:end|in|inline|inherited|item|object)\b
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
We can optimize the regular expression for regex flavors that support atomic grouping:
\b(?>end|in(?:line|herited)?|item|object)\b
Regex options: Case insensitive |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
Discussion
Matching a word from a list of words is very easy with a regular
expression. We simply use alternation to match any one of the keywords.
The word boundaries at the start and the end of the regex make sure we
only match entire words. The regex should match inline
rather than
in
when the file
contains inline
, and it should fail to match when
the file contains interesting
. Because alternation has the
lowest precedence of all regex operators, we have to put the list of
keywords inside a group. Here we used a noncapturing group for
efficiency. When using this regex as part of a larger regular
expression, you may want to use a capturing group instead, so you can
determine whether the regex matched a keyword or something else.
We can optimize this regular expression when using regular ...
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.