Appendix C. The Essential Guide to Regular Expressions

The concept of regular expressions (or regexes as they’re often known) is central to the Perl language. Regular expressions have been available for a long time in Unix tools such as grep, sed, awk, and egrep, and they have also made their way into Java and Python. But they are most closely associated with Perl where they are used extensively for pattern matching. They are also very important for data munging, as we describe in Appendix D.

Regular expressions are patterns of literals and metacharacters that match target combinations of characters embedded within input data. Although the simplest regular expression can be very simple indeed (it’s simply a literal string), regexes can also be very complex. They can provide amazing efficiency, but can also lead to great frustration. We have found that unless you live in the same universe as Spock or Data, where regexes compete with music and chess for sublime mathematical resonance, they most likely mean pain, bashed foreheads, and late-night viewings of Casablanca and The Matrix to calm the nerves. It’s only really by writing a million and one regexes that most people do eventually figure out what the heck is going on — and even then, there’s more to learn.

In this appendix, we’ll look at the origins of regular expressions and the main concepts underlying their use. We’ll also examine Perl’s built-in string-handling functions, which often supply enough functionality that you ...

Get Perl for Oracle DBAs now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.