You are previewing Parallel R.
O'Reilly logo
Parallel R

Book Description

R is a wonderful thing, indeed: in recent years this free, open-source product has become a popular toolkit for statistical analysis and programming. Two of R's limitations -- that it is single-threaded and memory-bound -- become especially troublesome in the current era of large-scale data analysis. It's possible to break past these boundaries by putting R on the parallel path. Parallel R will describe how to give R parallel muscle. Coverage will include stalwarts such as snow and multicore, and also newer techniques such as Hadoop and Amazon's cloud computing platform.

Table of Contents

  1. Parallel R
  2. A Note Regarding Supplemental Files
  3. Preface
    1. Conventions Used in This Book
    2. Using Code Examples
    3. Safari® Books Online
    4. How to Contact Us
    5. Acknowledgments
      1. Q. Ethan McCallum
      2. Stephen Weston
  4. 1. Getting Started
    1. Why R?
    2. Why Not R?
    3. The Solution: Parallel Execution
    4. A Road Map for This Book
      1. What We’ll Cover
      2. Looking Forward…
      3. What We’ll Assume You Already Know
    5. In a Hurry?
      1. snow
      2. multicore
      3. parallel
      4. R+Hadoop
      5. RHIPE
      6. Segue
    6. Summary
  5. 2. snow
    1. Quick Look
    2. How It Works
    3. Setting Up
    4. Working with It
      1. Creating Clusters with makeCluster
      2. Parallel K-Means
      3. Initializing Workers
      4. Load Balancing with clusterApplyLB
      5. Task Chunking with parLapply
      6. Vectorizing with clusterSplit
      7. Load Balancing Redux
      8. Functions and Environments
      9. Random Number Generation
      10. snow Configuration
      11. Installing Rmpi
      12. Executing snow Programs on a Cluster with Rmpi
      13. Executing snow Programs with a Batch Queueing System
      14. Troubleshooting snow Programs
    5. When It Works…
    6. …And When It Doesn’t
    7. The Wrap-up
  6. 3. multicore
    1. Quick Look
    2. How It Works
    3. Setting Up
    4. Working with It
      1. The mclapply Function
      2. The mc.cores Option
      3. The mc.set.seed Option
      4. Load Balancing with mclapply
      5. The pvec Function
      6. The parallel and collect Functions
      7. Using collect Options
      8. Parallel Random Number Generation
      9. The Low-Level API
    5. When It Works…
    6. …And When It Doesn’t
    7. The Wrap-up
  7. 4. parallel
    1. Quick Look
    2. How It Works
    3. Setting Up
    4. Working with It
      1. Getting Started
      2. Creating Clusters with makeCluster
      3. Parallel Random Number Generation
    5. Summary of Differences
    6. When It Works…
    7. …And When It Doesn’t
    8. The Wrap-up
  8. 5. A Primer on MapReduce and Hadoop
    1. Hadoop at Cruising Altitude
    2. A MapReduce Primer
    3. Thinking in MapReduce: Some Pseudocode Examples
      1. Calculate Average Call Length for Each Date
      2. Number of Calls by Each User, on Each Date
      3. Run a Special Algorithm on Each Record
    4. Binary and Whole-File Data: SequenceFiles
    5. No Cluster? No Problem! Look to the Clouds…
    6. The Wrap-up
  9. 6. R+Hadoop
    1. Quick Look
    2. How It Works
    3. Setting Up
    4. Working with It
      1. Simple Hadoop Streaming (All Text)
      2. Streaming, Redux: Indirectly Working with Binary Data
      3. The Java API: Binary Input and Output
      4. Processing Related Groups (the Full Map and Reduce Phases)
    5. When It Works…
    6. …And When It Doesn’t
    7. The Wrap-up
  10. 7. RHIPE
    1. Quick Look
    2. How It Works
    3. Setting Up
    4. Working with It
      1. Phone Call Records, Redux
      2. Tweet Brevity
      3. More Complex Tweet Analysis
    5. When It Works…
    6. …And When It Doesn’t
    7. The Wrap-up
  11. 8. Segue
    1. Quick Look
    2. How It Works
    3. Setting Up
    4. Working with It
      1. Model Testing: Parameter Sweep
    5. When It Works…
    6. …And When It Doesn’t
    7. The Wrap-up
  12. 9. New and Upcoming
    1. doRedis
    2. RevoScale R and RevoConnectR (RHadoop)
    3. cloudNumbers.com
  13. About the Authors
  14. Copyright