Rule-Based Parsers

Stepping above the simple StringScanner-based parsers that we’ve looked at thus far requires us to enter a heady and complex world. Parsing is studied by both computer scientists and linguists, and its techniques are the subject of much research. Major advancements continue to be made, and the field is an active one.

But as with most such fields, it’s possible to reap the advantages of this active research without having to study for a doctorate, and one relatively recent development that will be useful to us is the parsing expression grammar (PEG). A grammar is essentially a way of defining the rules that govern a particular language. Using a grammar, a parser can then discern the meaning of text written in that language. ...

Get Text Processing with Ruby now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.