Capturing and Clustering

Patterns allow you to group portions of your pattern together into subpatterns and to remember the strings matched by those subpatterns. We call the first behavior clustering and the second one capturing.

Capturing

To capture a substring for later use, put parentheses around the subpattern that matches it. The first pair of parentheses stores its substring in $1, the second pair in $2, and so on. You may use as many parentheses as you like; Perl just keeps defining more numbered variables for you to represent these captured strings.

Some examples:

/(\d)(\d)/  # Match two digits, capturing them into $1 and $2
/(\d+)/     # Match one or more digits, capturing them all into $1
/(\d)+/     # Match a digit one or more times, capturing the last into $1

Note the difference between the second and third patterns. The second form is usually what you want. The third form does not create multiple variables for multiple digits. Parentheses are numbered when the pattern is compiled, not when it is matched.

Captured strings are often called backreferences because they refer back to parts of the captured text. There are actually two ways to get at these backreferences. The numbered variables you've seen are how you get at backreferences outside of a pattern, but inside the pattern, that doesn't work. You have to use \1, \2, etc.[9] So to find doubled words like "the the" or "had had", you might use this pattern:

/\b(\w+) \1\b/i

But most often, you'll be using the $1 form, because you'll ...

Get Programming Perl, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.