In Chapter 3, we introduced the idea of decision boundaries and noted that problems in which the decision boundary isn’t linear pose a problem for simple classification algorithms. In Chapter 6, we showed you how to perform logistic regression, a classification algorithm that works by constructing a linear decision boundary. And in both chapters, we promised to describe a technique called the kernel trick that could be used to solve problems with nonlinear decision boundaries. Let’s deliver on that promise by introducing a new classification algorithm called the support vector machine (SVM for short), which allows you to use multiple different kernels to find nonlinear decision boundaries. We’ll use an SVM to classify points from a data set with a nonlinear decision boundary. Specifically, we’ll work with the data set shown in Figure 12-1.
Looking at this data set, it should be clear that the points from
Class 0 are on the periphery, whereas points from Class 1 are in the
center of the plot. This sort of nonlinear decision boundary can’t be
discovered using a simple classification algorithm like the logistic
regression algorithm we described in Chapter 6. Let’s demonstrate that by trying to use logistic regression
glm function. We’ll then look into the reason
why logistic regression fails.
df <- read.csv('data/df.csv') logit.fit <- glm(Label ~ X + Y, family = binomial(link = 'logit'), data = df) logit.predictions ...