RHadoop

R is a programming language used for statistics, data science, and visualization. It has a number of packages that can be imported to perform some specialized or custom tasks. It has more than 5,000 data analysis algorithms implemented as libraries. These algorithms can be used to facilitate a wide variety of data analysis tasks, much more than those supported by Apache Mahout. The community using R as a language is very big and vibrant.

However, R has two drawbacks: it executes in memory and its support for multithreading is minimal. These drawbacks make R unsuitable for big data crunching where disk-based analysis and distribution are mandatory. One alternative would be using R programs by using Hadoop Streaming. But this is a tedious ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.