Summary

Although the main goal of this chapter was to focus on data processing in Hadoop using the R language, throughout its various parts and sections you've been exposed to numerous different techniques and approaches used in Big Data analytics. We only hope that it wasn't too overwhelming!

We kicked off by introducing you to the diversity of Hadoop ecosystem, its tools and applications available to users, HDFS, and MapReduce frameworks.

We then created a single-node Hadoop cluster in which we carried out a simple word count MapReduce exercise in Java and the R languages, and we also showed you how to manage HDFS from the Linux command line and RStudio Server.

Finally, we achieved something that you probably won't be able to find in many (if any!) ...

Get Big Data Analytics with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.