There's more...

You can either use publically available data sources in libsvm format or a Spark API call, SVMDataGenerator(), which generates sample data for SVM (that is, Gaussian distribution):

object SVMDataGenerator() 

The idea behind SVM can be summarized as follows: rather than using a linear discriminant (for example, selecting a line among many lines) and an objective function (for example, least square minimization) to separate and label the left-hand variable, use the largest separating margin (as shown in the following graph) first and then draw the solid line in between the largest margin. Another way to think about it is how you can use two lines (the dashed lines in the following graph) to separate the classes the most (that ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.