Chapter 8. Segue

Welcome to the last of the book’s recipes for R parallelism. This will be a short chapter, but don’t let that fool you: Segue’s scope is intentionally narrow. This focus makes it a particularly powerful tool.

Segue’s mission is as simple as it gets: make it easy to use Elastic MapReduce as a parallel backend for lapply()-style operations. So easy, in fact, that it boasts of doing this in only two lines of R code.^[59]

This narrow focus is no accident. Segue’s creator, JD Long, wanted occasional access to a Hadoop cluster to run his pleasantly parallel,^[60] computationally expensive models. Elastic MapReduce was a great fit but still a bit cumbersome for his workflow. He created Segue to tackle the grunt work so he could focus on his higher-level modeling tasks.

Segue is a relatively young package. Nonetheless, since its creation in 2010, it has attracted a fair amount of attention.

Quick Look

Motivation: You want Hadoop power to drive some lapply() loops, perhaps for a parameter sweep, but you want minimal Hadoop contact. You consider MapReduce to be too much of a distraction from your work.

Solution: Use the segue package’s emrlapply() to send your calculations up to Elastic MapReduce, the Amazon Web Services cloud-based Hadoop product.

Good because: You get to focus on your modelling work, while segue takes care of transforming your lapply() work into a Hadoop job.

How It Works

Segue takes care of launching the Elastic MapReduce cluster, shipping data back and forth, and ...

Get Parallel R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Parallel R by Q. Ethan McCallum, Stephen Weston

Chapter 8. Segue

Quick Look

How It Works

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly