- Machine Learning for Hackers
- Preface
- 1. Using R
- 2. Data Exploration
- 3. Classification: Spam Filtering
- 4. Ranking: Priority Inbox
- 5. Regression: Predicting Page Views
- 6. Regularization: Text Regression
- 7. Optimization: Breaking Codes
- 8. PCA: Building a Market Index
- 9. MDS: Visually Exploring US Senator Similarity
- 10. kNN: Recommendation Systems
- 11. Analyzing Social Graphs
- 12. Model Comparison
- Works Cited
- Index
- About the Authors
- Colophon
- Copyright

In the last chapter, we saw how we could use simple correlational techniques to create a measure of similarity between the members of Congress based on their voting records. In this chapter, we’re going to talk about how you can use those same sort of similarity metrics to recommend items to a website’s users.

The algorithm we’ll use is called *k*-nearest
neighbors. It’s arguably the most intuitive of all the machine learning
algorithms that we present in this book. Indeed, the simplest form of
*k*-nearest neighbors is the sort of algorithm most
people would spontaneously invent if asked to make recommendations using
similarity data: they’d recommend the song that’s closest to the songs a
user already likes, but not yet in that list. That intuition is
essentially a 1-nearest neighbor algorithm. The full
*k*-nearest neighbor algorithm amounts to a
generalization of this intuition where you draw on more than one data
point before making a recommendation.

The full *k*-nearest neighbors algorithm works
much in the way some of us ask for recommendations from our friends.
First, we start with people whose taste we feel we share, and then we
ask a bunch of them to recommend something to us. If many of them
recommend the same thing, we deduce that we’ll like it as well.

How can we take that intuition and transform it into something algorithmic? Before we work on making recommendations based on real data, let’s start with something ...