GLM example with Spark and R on the HDInsight cluster

In the first practical example of this chapter we will use the HDInsight cluster with Spark and Hadoop, and run a Generalized Linear Model (GLM) on the flight data available for you to download from the Packt Publishing website created for this book.

Preparing the Spark cluster and reading the data from HDFS

Before carrying out any analytics on the data, let's firstly double-check whether you have all of the required resources in place. In this tutorial, we will be using the same multi-node HDInsight cluster that you previously deployed following the instructions in Chapter 7, Faster than Hadoop: Spark with R and specifically the section on Launching HDInsight with Spark and R/RStudio. If you ...

Get Big Data Analytics with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.