Summary

We began this chapter by introducing you gently to the rich and abundant world of machine learning algorithms and open-source tools which facilitate their application of large datasets.

We then moved on to practical tutorials during which we presented you with three different machine learning methods run on a multi-node Microsoft Azure HDInsight cluster with Hadoop, Spark, and RStudio Server installed. In the first example you learnt how to perform a logistic regression through the Spark MLlib module using the SparkR package for R with HDFS as a data source.

In two further tutorials, we explored the powerful capabilities of H2O-an open-source, highly-optimized platform for Big Data machine learning models run through the h2o package for ...

Get Big Data Analytics with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.