O'Reilly logo

Real World Haskell by Donald Bruce Stewart, Bryan O'Sullivan, John Goerzen

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Matching on Strings

The second part of a good regular expression library is the matching function. Given a compiled regular expression, this function does the matching of the compiled regex against some input, indicating whether it matched, and if so, what parts of the string matched. In PCRE, this function is pcre_exec, which has type:

int pcre_exec(const pcre *code,
              const pcre_extra *extra,
              const char *subject,
              int length,
              int startoffset,
              int options,
              int *ovector,
              int ovecsize);

The most important arguments are the input pcre pointer structure (which we obtained from pcre_compile) and the subject string. The other flags let us provide bookkeeping structures and space for return values. We can directly translate this type to the Haskell import declaration:

-- file: ch17/RegexExec.hs
foreign import ccall "pcre.h pcre_exec"
    c_pcre_exec     :: Ptr PCRE
                    -> Ptr PCREExtra
                    -> Ptr Word8
                    -> CInt
                    -> CInt
                    -> PCREExecOption
                    -> Ptr CInt
                    -> CInt
                    -> IO CInt

We use the same method as before to create typed pointers for the PCREExtra structure, and a newtype to represent flags passed at regex execution time. This lets us ensure that users don’t pass compile-time flags incorrectly at regex runtime.

Extracting Information About the Pattern

The main complication involved in calling pcre_exec is the array of int pointers used to hold the offsets of matching substrings found by the pattern matcher. These offsets are held in an offset vector, whose required size is determined by analyzing the input regular expression ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required