Spark ML examples

Now, let's look at how to develop machine learning applications. Naturally, we need interesting Datasets to implement the algorithms; we will use appropriate Datasets for the algorithms shown in the next section. In this book, we will use Scala, but I have included iPython notebooks for the algorithm examples in Python.

Note

The code and data files are available in the GitHub repository at https://github.com/xsankar/fdps-v3. Well keep it updated with the corrections.

Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.