8.2. Building a Regular Expression Parser

Metalanguages that allow a user to specify character patterns, using symbols such as “|” and “~”, are typically called regular expressions. There is no standard for which symbols belong in this type of metalanguage, although the language Perl is probably the most ambitious matcher of regular expressions. In the expression language you provide to your user, you have complete freedom in the symbols you provide and the meaning you assign to those symbols.

This section shows how to create a basic regular expression recognizer. This metalanguage will allow “|” to mean alternation, “*” to mean repetition, and simple juxtaposition (or “nextness”) to mean sequence. Individual characters such as a and b simply ...

Get Building Parsers with Java™ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.