Chapter 11. Big Data Analysis (R and Hadoop)

In this chapter, we will cover the following topics:

  • Preparing the RHadoop environment
  • Installing rmr2
  • Installing rhdfs
  • Operating HDFS with rhdfs
  • Implementing a word count problem with RHadoop
  • Comparing the performance between an R MapReduce program and a standard R program
  • Testing and debugging the rmr2 program
  • Installing plyrmr
  • Manipulating data with plyrmr
  • Conducting machine learning with RHadoop
  • Configuring RHadoop clusters on Amazon EMR

Introduction

RHadoop is a collection of R packages that enables users to process and analyze big data with Hadoop. Before understanding how to set up RHadoop and put it in to practice, we have to know why we need to use machine learning to big-data scale.

In the previous chapters, ...

Get R: Recipes for Analysis, Visualization and Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.