Chapter 2. snow

snow (“Simple Network of Workstations”) is probably the most popular parallel programming package available for R. It was written by Luke Tierney, A. J. Rossini, Na Li, and H. Sevcikova, and is actively maintained by Luke Tierney. It is a mature package, first released on the “Comprehensive R Archive Network” (CRAN) in 2003.

Quick Look

Motivation: You want to use a Linux cluster to run an R script faster. For example, you’re running a Monte Carlo simulation on your laptop, but you’re sick of waiting many hours or days for it to finish.

Solution: Use snow to run your R code on your company or university’s Linux cluster.

Good because: snow fits well into a traditional cluster environment, and is able to take advantage of high-speed communication networks, such as InfiniBand, using MPI.

How It Works

snow provides support for easily executing R functions in parallel. Most of the parallel execution functions in snow are variations of the standard lapply() function, making snow fairly easy to learn. To implement these parallel operations, snow uses a master/worker architecture, where the master sends tasks to the workers, and the workers execute the tasks and return the results to the master.

One important feature of snow is that it can be used with different transport mechanisms to communicate between the master and workers. This allows it to be portable, but still take advantage of high-performance communication mechanisms if available. snow can be used with socket connections, ...

Get Parallel R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.