Parsing a file, or data of various types, is a common task for programmers. We already learned about Haskell’s support for regular expressions back in Regular Expressions in Haskell. Regular expressions are nice for many tasks, but they rapidly become unwieldy, or cannot be used at all, when dealing with a complex data format. For instance, we cannot use regular expressions to parse source code from most programming languages.
Parsec is a useful parser combinator library, with which we combine small parsing functions to build more sophisticated parsers. Parsec provides some simple parsing functions, as well as functions to tie them all together. It should come as no surprise that this parser library for Haskell is built around the notion of functions.
It’s helpful to know where Parsec fits compared to the tools used for parsing in other languages. Parsing is sometimes divided into two stages: lexical analysis (the domain of tools such as flex) and parsing itself (performed by programs such as bison). Parsec can perform both lexical analysis and parsing.
Let’s jump right in and write some code for parsing a CSV file. CSV files are often used as a plain-text representation of spreadsheets or databases. Each line is a record, and each field in the record is separated from the next by a comma. There are ways of dealing with fields that contain commas, but we won’t worry about that now.
This first example is much longer ...