Installing the package and Spark

To begin, you need to install a few packages and the Spark itself. To do it, call the following codes; it can take some time to download Spark:

install.packages(c("dplyr", "sparklyr", "DAAG"))library(sparklyr); library(dplyr)#installing Sparkspark_install()

The DAAG package contains the dataset we are going to use. So, let's start our learning. This chapter is divided into five sections plus this introduction. The next section teaches you how to manipulate Spark data using dplyr and SQL query. In the second section, we bring Spark data into R, for analysis and visualization. The third section shows how to use the Spark or the H2O machine learning algorithms. The fourth section presents the Spark API. Lastly, ...

Get Hands-On Data Science with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.