Support Vector Machine (SVM) with Spark 2.0

In this recipe, we use Spark's RDD-based SVM API SVMWithSGD with SGD to classify the population into two binary classes, and then use count and BinaryClassificationMetrics to look at model performance.

In the interest of time and space, we use the sample LIBSVM format already supplied with Spark, but provide links to additional data files offered by National Taiwan University so the reader can experiment on their own. Support Vector Machine (SVM) as a concept is fundamentally very simple, unless you want to get into the details of its implementation in Spark or any other package.

While the mathematics behind SVM is beyond the scope of this book, readers are encouraged to read the following tutorials ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.