2.4. Match Any Character

Problem

Match a quoted character. Provide one solution that allows any single character, except a line break, between the quotes. Provide another that truly allows any character, including line breaks.

Solution

Any character except line breaks

'.'
Regex options: None (the “dot matches line breaks” option must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Any character including line breaks

'.'
Regex options: Dot matches line breaks
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
'[\s\S]'
Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Discussion

Any character except line breaks

The dot is one of the oldest and simplest regular expression features. Its meaning has always been to match any single character.

There is, however, some confusion as to what any character truly means. The oldest tools for working with regular expressions processed files line by line, so there was never an opportunity for the subject text to include a line break. The programming languages discussed in this book process the subject text as a whole, no matter how many line breaks you put into it. If you want true line-by-line processing, you have to write a bit of code that splits the subject into an array of lines and applies the regex to each line in the array. Recipe 3.21 in the next chapter shows how to do this.

Larry Wall, the developer of Perl, wanted Perl to retain the traditional behavior of line-based tools, in which the dot never matched ...

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.