2.3. Match One of Many Characters

Problem

Create one regular expression to match all common misspellings of calendar, so you can find this word in a document without having to trust the author’s spelling ability. Allow an a or e to be used in each of the vowel positions. Create another regular expression to match a single hexadecimal character. Create a third regex to match a single character that is not a hexadecimal character.

Solution

Calendar with misspellings

c[ae]l[ae]nd[ae]r
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hexadecimal character

[a-fA-F0-9]
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Nonhexadecimal character

[^a-fA-F0-9]
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

The notation using square brackets is called a character class. A character class matches a single character out of a list of possible characters. The three classes in the first regex match either an a or an e. They do so independently. When you test calendar against this regex, the first character class matches a, the second e, and the third a.

Outside character classes, a dozen punctuation characters are metacharacters. Inside a character class, only four characters have a special function: \, ^, -, and ]. If you’re using Java or .NET, the opening bracket [ is also a metacharacter inside character classes. All other characters are literals and simply add themselves to the character class. ...

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.