2.1. Match Literal Text

Problem

Create a regular expression to exactly match this gloriously contrived sentence: The punctuation characters in the ASCII table are: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~.

This is intended to show which characters have special meaning in regular expressions, and which characters always match themselves literally.

Solution

This regular expression matches the sentence stated in the problem:

ThepunctuationcharactersintheASCIItableare:↵
!"#\$%&'\(\)\*\+,-\./:;<=>\?@\[\\]\^_`\{\|}~
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Any regular expression that does not include any of the dozen characters $()*+.?[\^{| simply matches itself. To find whether Mary had a little lamb in the text you’re editing, simply search for Maryhadalittlelamb. It doesn’t matter whether the “regular expression” checkbox is turned on in your text editor.

The 12 punctuation characters that make regular expressions work their magic are called metacharacters. If you want your regex to match them literally, you need to escape them by placing a backslash in front of them. Thus, the regex: \$\(\)\*\+\.\?\[\\\^\{\| matches the text $()*+.?[\^{|.

Notably absent from the list are the closing square bracket ], the hyphen -, and the closing curly bracket }. The first two become metacharacters only after an unescaped [, and the } only after an unescaped {. There’s no need to ever escape }. Metacharacter rules for the blocks that appear between ...

Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.