Introduction to Regexes and Pattern Matching

A regular expression is a string containing a combination of normal characters and special metacharacters or metasequences. The normal characters match themselves. Metacharacters and metasequences are characters or sequences of characters that represent ideas such as quantity, locations, or types of characters. The list in "Regex Metacharacters, Modes, and Constructs" shows the most common metacharacters and metasequences in the regular expression world. Later sections list the availability of and syntax for supported metacharacters for particular implementations of regular expressions.

Pattern matching consists of finding a section of text that is described (matched) by a regular expression. The underlying code that searches the text is the regular expression engine. You can predict the results of most matches by keeping two rules in mind:

  1. The earliest (leftmost) match wins

    Regular expressions are applied to the input starting at the first character and proceeding toward the last. As soon as the regular expression engine finds a match, it returns. (See MRE 148–149.)

  2. Standard quantifiers are greedy

    Quantifiers specify how many times something can be repeated. The standard quantifiers attempt to match as many times as possible. They settle for less than the maximum only if this is necessary for the success of the match. The process of giving up characters and trying less-greedy matches is called backtracking. (See MRE 151–153.)

Regular expression ...

Get Regular Expression Pocket Reference, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.