- Machine Learning for Hackers
- Preface
- 1. Using R
- 2. Data Exploration
- 3. Classification: Spam Filtering
- 4. Ranking: Priority Inbox
- 5. Regression: Predicting Page Views
- 6. Regularization: Text Regression
- 7. Optimization: Breaking Codes
- 8. PCA: Building a Market Index
- 9. MDS: Visually Exploring US Senator Similarity
- 10. kNN: Recommendation Systems
- 11. Analyzing Social Graphs
- 12. Model Comparison
- Works Cited
- Index
- About the Authors
- Colophon
- Copyright

In Chapter 3, we introduced the idea of decision boundaries and noted that problems in which the decision boundary isn’t linear pose a problem for simple classification algorithms. In Chapter 6, we showed you how to perform logistic regression, a classification algorithm that works by constructing a linear decision boundary. And in both chapters, we promised to describe a technique called the kernel trick that could be used to solve problems with nonlinear decision boundaries. Let’s deliver on that promise by introducing a new classification algorithm called the support vector machine (SVM for short), which allows you to use multiple different kernels to find nonlinear decision boundaries. We’ll use an SVM to classify points from a data set with a nonlinear decision boundary. Specifically, we’ll work with the data set shown in Figure 12-1.

Looking at this data set, it should be clear that the points from
Class 0 are on the periphery, whereas points from Class 1 are in the
center of the plot. This sort of nonlinear decision boundary can’t be
discovered using a simple classification algorithm like the logistic
regression algorithm we described in Chapter 6. Let’s demonstrate that by trying to use logistic regression
through the `glm`

function. We’ll then look into the reason
why logistic regression fails.

df <- read.csv('data/df.csv') logit.fit <- glm(Label ~ X + Y, family = binomial(link = 'logit'), data = df) logit.predictions ...