Conducting predictive analytics using Spark MLib

Spark has a very rich machine learning library called MLib. This is a collection of various algorithms that are used for classification, clustering, recommendations, and so on. In this recipe, we are going to take a look at how to build a predictive model using MLib.

Getting ready

To perform this recipe, you should have Hadoop and Spark installed. You also need to install Scala. Here, I am using Scala 2.11.0.

How to do it...

For this recipe, we are going use the classic example dataset of iris flowers; you can find out more about this at https://en.wikipedia.org/wiki/Iris_flower_data_set.

Here, based on the petal length and width and the sepal length and width, we need to classify the flowers into species. ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.