Using Parentheses To Override Precedence

Once you understand how to build regular expressions, you need not worry about remembering the terms “atom”, “piece”, and “branch”. The terms exist only to help you learn the precedence of the regular-expression operators. To avoid confusion, from now on I will generically refer to any subpattern of a complete pattern when it is unimportant whether it is an atom, piece, or branch.

Because operators such as * and + act only on atoms, they cannot be applied directly to pieces and branches. For example, the pattern ab* matches an a followed by any number of b’s. In order to treat any subpattern—atom, piece, or branch—as an atom, enclose it in parentheses. Thus, in order to match any number of ab’s, use the pattern "(ab)*“.

Matching real numbers is a good exercise. Real numbers have a whole portion to the left of the decimal point and a fractional portion to the right. A direct rendering of this concept is "-?[0-9]*\.?[0-9]*“. Notice the period is escaped by placing a backslash in front of it. This forces it to match a literal period rather than any character. The entire pattern matches things like "17.78“, "-8“, and "0.21“. Unfortunately, it also accepts 0000.5, which does not seem quite right. You can reject leading zeros while still accepting a single zero the same way I did earlier—with a branch: "-?(0|[1-9][0-9]*)?\.?[0-9]*“. This pattern accepts the earlier numbers but it rejects "0000.5“. Unfortunately, it still matches "-0“. You can fix ...

Get Exploring Expect now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.