Cover by Brian Carper, Christophe Grand, Chas Emerick

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo

Parallelism on the Cheap

We’ll be examining all of Clojure’s flexible concurrency facilities in a bit, one of which—agents—can be used to orchestrate very efficient parallelization of workloads. However, sometimes you may find yourself wanting to parallelize some operation with as little ceremony as possible.

The flexibility of Clojure’s seq abstraction[128] makes implementing many routines in terms of processing sequences very easy. For example, say we had a function that uses a regular expression to find and return phone numbers found within other strings:

(defn phone-numbers
  (re-seq #"(\d{3})[\.-]?(\d{3})[\.-]?(\d{4})" string))
;= #'user/phone-numbers
(phone-numbers " Sunil: 617.555.2937, Betty: 508.555.2218")
;= (["617.555.2937" "617" "555" "2937"] ["508.555.2218" "508" "555" "2218"])

Simple enough, and applying it to any seq of strings is easy, fast, and effective. These seqs could be loaded from disk using slurp and file-seq, or be coming in as messages from a message queue, or be the results obtained by retrieving large chunks of text from a database. To keep things simple, we can dummy up a seq of 100 strings, each about 1MB in size, suffixed with some phone numbers:

(def files (repeat 100
                   (apply str
                     (concat (repeat 1000000 \space)
                             "Sunil: 617.555.2937, Betty: 508.555.2218"))))

Let’s see how fast we can get all of the phone numbers from all of these “files”:

(time (dorun (map phone-numbers files)))  1
; "Elapsed time: 2460.848 msecs"

We’re using dorun here to fully realize the lazy seq produced by map and simultaneously release the results of that realization since we don’t want to have all of the found phone numbers printed to the REPL.

This is parallelizable though, and trivially so. There is a cousin of mappmap – that will parallelize the application of a function across a sequence of values, returning a lazy seq of results just like map:

(time (dorun (pmap phone-numbers files)))  1
; "Elapsed time: 1277.973 msecs"

Run on a dual-core machine, this roughly doubles the throughput compared to the use of map in the prior example; for this particular task and dataset, roughly a 4x improvement could be expected on a four-core machine, and so on. Not bad for a single-character change to a function name! While this might look magical, it’s not; pmap is simply using a number of futures—calibrated to suit the number of CPU cores available—to spread the computation involved in evaluating phone-numbers for each file across each of those cores.

This works for many operations, but you still must use pmap judiciously. There is a degree of overhead associated with parallelizing operations like this. If the operation being parallelized does not have a significant enough runtime, that overhead will dominate the real work being performed; this can make a naive application of pmap slower than the equivalent use of map:

(def files (repeat 100000
                   (apply str
                     (concat (repeat 1000 \space)
                             "Sunil: 617.555.2937, Betty: 508.555.2218"))))

(time (dorun (map phone-numbers files)))
; "Elapsed time: 2649.807 msecs"
(time (dorun (pmap phone-numbers files)))
; "Elapsed time: 2772.794 msecs"

The only change we’ve made here is to the data: each string is now around 1K in size instead of 1MB in size. Even though the total amount of work is the same (there are more “files”), the parallelization overhead outstrips the gains we get from putting each evaluation of phone-numbers onto a different future/core. Because of this overhead, it is very common to see speedups of something less than Nx (where N is the number of CPU cores available) when using pmap. The lesson is clear: use pmap when the operation you’re performing is parallelizable in the first place, and is significant enough for each value in the seq that its workload will eclipse the process coordination inherent in its parallelization. Trying to force pmap into service where it’s not warranted can be disastrous.

There is often a workaround for such scenarios, however. You can often efficiently parallelize a relatively trivial operation by chunking your dataset so that each unit of parallelized work is larger. In the above example, the unit of work is just 1K of text; however, we can take steps to ensure that the unit of work is larger, so that each value processed by pmap is a seq of 250 1K strings, thus boosting the work done per future dispatch and cutting down on the parallelization overhead:

(time (->> files
        (partition-all 250)
        (pmap (fn [chunk] (doall (map phone-numbers chunk))))  1
        (apply concat)
; "Elapsed time: 1465.138 msecs"

map will return a lazy seq, so we use doall to force the realization of that lazy seq within the scope of the function provided to pmap. Otherwise, phone-numbers would never be called at all in parallel, leaving the work of applying it to each string to whatever process might have consumed the lazy seq later.

By changing the chunk size of our workload, we’ve regained the benefits of parallelization even though our per-operation computation complexity dropped substantially when applied to many more smaller strings.

Two other parallelism constructs are built on top of pmap: pcalls and pvalues. The former evaluates any number of no-arg functions provided as arguments, returning a lazy sequence of their return values; the latter is a macro that does the same, but for any number of expressions.

[128] Which we discussed in Sequences.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required