Searching a string using the Rabin-Karp algorithm

The Rabin-Karp algorithm finds a pattern in a body of text by matching a unique representation of the pattern against a moving window. The unique representation, or hash, is computed by considering a string as a number written in an arbitrary base of 26 or greater.

The advantage of Rabin-Karp is in searching for many needles in a haystack. It's not very efficient to search for just a single string. After the initial preprocessing of the corpus, the algorithm can quickly find matches.

Getting ready

Install the Data.ByteString.Search library from Cabal as follows:

$ cabal install stringsearch

How to do it...

  1. Use the OverloadedStrings language extension to facilitate the ByteString manipulations in our ...

Get Haskell Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.