You can either use publically available data sources in libsvm format or a Spark API call, SVMDataGenerator(), which generates sample data for SVM (that is, Gaussian distribution):
object SVMDataGenerator()
The idea behind SVM can be summarized as follows: rather than using a linear discriminant (for example, selecting a line among many lines) and an objective function (for example, least square minimization) to separate and label the left-hand variable, use the largest separating margin (as shown in the following graph) first and then draw the solid line in between the largest margin. Another way to think about it is how you can use two lines (the dashed lines in the following graph) to separate the classes the most (that ...