Summary

In this chapter, we looked at supervised and unsupervised learning and a few examples of how to run them in Spark/Scala. We considered SVM, logistic regression, decision tree, and k-means in the example of UCI Iris dataset. This is in no way a complete guide, and many other libraries either exist or are being made as we speak, but I would bet that you can solve 99% of the immediate data analysis problems just with these tools.

This will give you a very fast shortcut on how to start being productive with a new dataset. There are many other ways to look at the datasets, but before we get into more advanced topics, let's discuss regression and classification in the next chapter, that is, how to predict continuous and discrete labels.

Get Mastering Scala Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.