How it works...

We use the dataset and Random Forest Tree to solve a regression problem with the data. The mechanics of parsing and separating remains the same, but we use the following two APIs to do the tree regression and evaluate the results:

  • RandomForest.trainRegressor()
  • RegressionMetrics()

Noteworthy is the definition of the getMetrics() function to utilize the RegressionMetrics() facility in Spark:

def getMetrics(model: RandomForestModel, data: RDD[LabeledPoint]): RegressionMetrics = {val predictionsAndLabels = data.map(example => (model.predict(example.features), example.label) )new RegressionMetrics(predictionsAndLabels)}

We also set the impurity value to "variance" so we can use the variance for measuring errors:

val impurity ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.