We use the dataset and Random Forest Tree to solve a regression problem with the data. The mechanics of parsing and separating remains the same, but we use the following two APIs to do the tree regression and evaluate the results:
- RandomForest.trainRegressor()
- RegressionMetrics()
Noteworthy is the definition of the getMetrics() function to utilize the RegressionMetrics() facility in Spark:
def getMetrics(model: RandomForestModel, data: RDD[LabeledPoint]): RegressionMetrics = {val predictionsAndLabels = data.map(example => (model.predict(example.features), example.label) )new RegressionMetrics(predictionsAndLabels)}
We also set the impurity value to "variance" so we can use the variance for measuring errors:
val impurity ...