Within the programming community, one of the most famous software systems to credit functional programming for inspiration is Google’s MapReduce infrastructure for parallel processing of bulk data.
We can easily construct a greatly simplified, but still useful, Haskell equivalent. To focus our attention, we will look at processing web server logfiles, which tend to be both huge and plentiful.As an example, here is a log entry for a page visit recorded by the Apache Web Server. The entry originally filled one line—we split it across several lines to fit:
184.108.40.206 - - [08/Jun/2008:07:04:20 -0500] "GET / HTTP/1.1" 200 2097 "http://en.wikipedia.org/wiki/Mercurial_(software)" "Mozilla/5.0 (Windows; U; Windows XP 5.1; en-GB; rv:220.127.116.11) Gecko/20080201 Firefox/18.104.22.168" 0 hgbook.red-bean.com
While we could create a straightforward implementation without much effort, we will resist the temptation to dive in. If we think about solving a class of problems instead of a single one, we may end up with more widely applicable code.
When we develop a parallel program, we always face a few “bad penny” problems, which turn up regardless of the underlying programming language. A few are described here:
Our algorithm quickly becomes obscured by the details of partitioning and communication. This makes it difficult to understand code, which in turn makes modifying it risky.
Choosing a grain size—the smallest unit of work parceled out to a core—can be difficult. If ...