Other Packages for Parallel Computation with R
Segue
The segue
package by JD
Long is a great choice for running simple parallel programs; it’s
intended to be a gentle introduction to parallel computation. Segue runs
programs in the cloud using AWS’s Elastic MapReduce service. (This is a
distinct product from EC2, which I used to install my own private Hadoop
cluster.) It borrows some Hadoop infrastructure, but it isn’t a full
map/reduce package. Segue
is modeled
on the apply
function in R; you use
it to apply a function to a data set across a set of computers in the
cloud. Let’s show how it works.
The segue
package is hosted on
Google Code, not CRAN. To install it, you can use the install_url
command in the devtools
package:
> library(devtools) > # At the time I wrote this book, the current version was 0.05; > # make sure to change the link to get the latest version: > install_url("http://segue.googlecode.com/files/segue_0.05.tar.gz")
You’ll need an Amazon Web Services account to use it.
Warning
You will be billed by the hour for using AWS. Make sure that you understand how you will be charged and how to use AWS before you start.
You’ll need to get your Access Key ID and Secret Access Key from AWS’s Security Credentials page.
> library(segue) Loading required package: rJava Loading required package: caTools Loading required package: bitops Segue did not find your AWS credentials. Please run the setCredentials() function. > # set aws.access.id to your amazon access id, aws.secret.key to ...
Get R in a Nutshell, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.