Model estimation

Once feature sets get finalized in our last section, what follows is to estimate the parameters of the selected models, for which we can use either MLlib or R. As before, we need to arrange distributed computing.

To simplify, we can utilize Databricks' Job feature. Specifically, within the Databricks environment, we can go to Jobs to create jobs.

Model estimation

Then, users can select R notebooks to run specific clusters, and then schedule jobs. Once scheduled, users can also monitor the running and then collect the results.

In section, Methods for fraud detection, we prepared some codes for each of the three models selected. Now, we need to modify ...

Get Apache Spark Machine Learning Blueprints now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.