Chapter 12. Model Comparison

SVMs: The Support Vector Machine

In Chapter 3, we introduced the idea of decision boundaries and noted that problems in which the decision boundary isn’t linear pose a problem for simple classification algorithms. In Chapter 6, we showed you how to perform logistic regression, a classification algorithm that works by constructing a linear decision boundary. And in both chapters, we promised to describe a technique called the kernel trick that could be used to solve problems with nonlinear decision boundaries. Let’s deliver on that promise by introducing a new classification algorithm called the support vector machine (SVM for short), which allows you to use multiple different kernels to find nonlinear decision boundaries. We’ll use an SVM to classify points from a data set with a nonlinear decision boundary. Specifically, we’ll work with the data set shown in Figure 12-1.

Looking at this data set, it should be clear that the points from Class 0 are on the periphery, whereas points from Class 1 are in the center of the plot. This sort of nonlinear decision boundary can’t be discovered using a simple classification algorithm like the logistic regression algorithm we described in Chapter 6. Let’s demonstrate that by trying to use logistic regression through the glm function. We’ll then look into the reason why logistic regression fails.

df <- read.csv('data/df.csv') logit.fit <- glm(Label ~ X + Y, family = binomial(link = 'logit'), data = df) logit.predictions ...

Get Machine Learning for Hackers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.