Chapter 15. Understanding Regular Expressions, Part III

Jeffrey Friedl

In Understanding Regular Expressions, Part II, I used our knowledge of how Perl’s regex engine goes about a match to analyze and evaluate a few different solutions to a problem. I’d like to continue to look at the effects of greediness, backtracking, and other important aspects of Perl’s regex engine, this time to demonstrate some got-chas that await the unwary. A basic understanding of backtracking is a prerequisite; I recommend Understanding Regular Expressions, Part I.

Let’s start with a simple but illustrative example taken from daily life: continuation lines. Let’s say you’ve got the text of a csh-style configuration file in $_, and want to pluck alias definitions. You might use:

	while (m{^ \s* alias \s+ (\S+) \s+ (.*) }xmg) {

	    ($alias, $cmd) = ($1, $2);

	    ...work with $alias and $cmd as you like...

	}

This works fine if your string is from a tcsh shell’s startup script and has a line such as the following, put there as some jerk’s idea of a practical joke:

	alias ls 'echo Ha, got you, sucker! ; rm *'

However, if the jerk tried to be smart and cover his tracks a bit, the line might look like this:

	alias ls 'echo Ha, got you, sucker! ; rm * ;\

	          alias ls echo Ha, got you, sucker!'

In this version, the rm action of the alias happens just once, so by the time you know to look for an alias, you won’t see the rm. Anyway, this kind of “line” would break our approach, since it’s two physical lines.

Well, we know regular expressions, ...

Get Computer Science & Perl Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.