2.18. Add Comments to a Regular Expression

Problem

\d{4}-\d{2}-\d{2} matches a date in yyyy-mm-dd format, without doing any validation of the numbers. Such a simple regular expression is appropriate when you know your data does not contain any invalid dates. Add comments to this regular expression to indicate what each part of the regular expression does.

Solution

\d{4}    # Year
-        # Separator
\d{2}    # Month
-        # Separator
\d{2}    # Day
Regex options: Free-spacing
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Discussion

Free-spacing mode

Regular expressions can quickly become complicated and difficult to understand. Just as you should comment source code, you should comment all but the most trivial regular expressions.

All regular expression flavors in this book, except JavaScript, offer an alternative regular expression syntax that makes it very easy to clearly comment your regular expressions. You can enable this syntax by turning on the free-spacing option. It has different names in various programming languages.

In .NET, set the RegexOptions.IgnorePatternWhitespace option. In Java, pass the Pattern.COMMENTS flag. Python expects re.VERBOSE. PHP, Perl, and Ruby use the /x flag.

Turning on free-spacing mode has two effects. It turns the hash symbol (#) into a metacharacter, outside character classes. The hash starts a comment that runs until the end of the line or the end of the regex (whichever comes first). The hash and everything after it is simply ignored by the regular expression engine. ...

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.