Many programming tasks start with the interpretion of some form of structured textual data. Parsing is the process of converting such data into data structures that are easy to program against. For simple formats, it’s often enough to parse the data in an ad hoc way, say, by breaking up the data into lines, and then using regular expressions for breaking those lines down into their component pieces.
But this simplistic approach tends to fall down when parsing more complicated data, particularly data with the kind of recursive structure you find in full-blown programming languages or flexible data formats like JSON and XML. Parsing such formats accurately and efficiently while providing useful error messages is a complex task.
Often, you can find an existing parsing library that handles these issues for you. But there are tools to simplify the task when you do need to write a parser, in the form of parser generators. A parser generator creates a parser from a specification of the data format that you want to parse, and uses that to generate a parser.
Parser generators have a long history, including tools like lex and yacc that date back to the early 1970s. OCaml has its own alternatives, including ocamllex, which replaces lex, and ocamlyacc and menhir, which replace yacc. We’ll explore these tools in the course of walking through the implementation of a parser for the JSON serialization format that we discussed in Chapter 15.
Parsing is a ...