8.3. A Regular Expression Grammar

A modestly ambitious regular expression language allows the user to express sequences, alternations, and repetitions of characters. The following grammar is based on rules given in Introduction to Automata Theory, Languages and Computation [Hopcroft and Ullman]. The rules recognize conventional operator precedence, and they avoid the problem of left recursion.

expression    = term ('|' term)*; 
term          = factor factor*;
factor        = phrase | (phrase '*');
phrase        = letterOrDigit | '(' expression ')';
letterOrDigit = Letter | Digit;

Note that there is a major difference between the meaning of a vertical bar by itself and a vertical bar in single quotes. There is also a major difference between an asterisk and an asterisk ...

Get Building Parsers with Java™ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.