Chapter 15 Find Patterns and Make Predictions with K Nearest Neighbors

K Nearest Neighbors (KNN) provides an interesting contrast to the other two algorithms that we have seen in this section of the book, neural networks and decision trees. It also offers the sharpest contrast with traditional techniques. The basic concept is simple: When a new case is presented to the algorithm, it simply finds a small number of training cases that are most like the new case, and classifies the new case on the assumption that it will fall into the same category as the majority of the closest cases. If you are trying to predict the salary of an employee, and k = 5, the predicted salary will be the average of the 5 employees in the training dataset most like the new case. So, although we will start by predicting survival with the Titanic dataset, you could also use this technique for predicting salary with the Bank dataset used in the neural net chapter (13). After we’ve discussed finding “neighbors” and using KNN as a classifier, we will end the chapter with a comparison of the performance of the three classifiers we’ve seen in Part III.

There is no generalization in the form of a model made from the training set. There are no coefficients. There is no significance testing, and no goodness of fit like R2. The phrase that computer scientists specializing in machine learning often use to describe this is that KNN is a “lazy learner”—that is, it does not create a model in the traditional sense. ...

Get SPSS Statistics for Data Analysis and Visualization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.