Model estimation

Once feature sets get finalized in our last section, what follows is to estimate the parameters of the selected models, for which we can use either MLlib or R here, and we need to arrange the distributed computing.

To simplify, we can utilize Databricks' Job feature. Specifically, within the Databricks environment, we can go to Jobs and then create jobs, as shown in the following image:

Model estimation

Then, users can select notebooks to run, specify clusters, and schedule jobs. Once scheduled, users can also monitor the running and then collect the results.

In section, Methods for a holistic view, we prepared some codes for each of the three models ...

Get Apache Spark Machine Learning Blueprints now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.