The second part of a good regular expression library is the
matching function. Given a compiled regular expression, this function
does the matching of the compiled regex against some input, indicating
whether it matched, and if so, what parts of the string matched. In
PCRE, this function is
pcre_exec, which has type:
int pcre_exec(const pcre *code, const pcre_extra *extra, const char *subject, int length, int startoffset, int options, int *ovector, int ovecsize);
The most important arguments are the
pcre pointer structure (which we obtained from
pcre_compile) and the subject string. The other flags let
us provide bookkeeping structures and space for return values. We can
directly translate this type to the Haskell import declaration:
-- file: ch17/RegexExec.hs foreign import ccall "pcre.h pcre_exec" c_pcre_exec :: Ptr PCRE -> Ptr PCREExtra -> Ptr Word8 -> CInt -> CInt -> PCREExecOption -> Ptr CInt -> CInt -> IO CInt
We use the same method as before to
create typed pointers for the
PCREExtra structure, and a
newtype to represent flags passed at regex execution time.
This lets us ensure that users don’t pass compile-time flags incorrectly
at regex runtime.
The main complication involved in
pcre_exec is the array of
int pointers used to hold the offsets of matching substrings found by the pattern matcher. These offsets are held in an offset vector, whose required size is determined by analyzing the input regular expression ...