Model estimation

Once the feature sets get finalized, in our last section, what follows is the estimating of parameters of the selected models, for which we can use MLlib on the Zeppelin notebook.

Similar to what we did before, for the best modeling, we need to arrange distributed computing, especially for this case, with various student segments for various study subjects. For this distributed computing part, readers may refer to previous chapters as we will not repeat them here.

Spark implementation with the Zeppelin notebook

With MLlib for SCALA code for random forest, we will use the following code:

// Train a RandomForest model. val treeStrategy = Strategy.defaultStrategy("Classification") val numTrees = 300 val featureSubsetStrategy = "auto" ...

Get Apache Spark Machine Learning Blueprints now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.