In this recipe we'll see how to apply the functions such as dapply, gapply and lapply over the Spark DataFrame.
To step through this recipe, you will need a running Spark Cluster either in pseudo distributed mode or in one of the distributed modes that is, standalone, YARN, or Mesos. Also, install RStudio. Please refer the Installing R recipe for details on the installation of R and Creating SparkR DataFrames recipe to get acquainted with the creation of DataFrames from a variety of data sources.
In this recipe, we'll see how to apply the user defined functions available as of Spark 2.0.2.
dapplyon the Spark DataFrame.
schema <- structType(structField("eruptions", ...