After collecting data about the things people like, you need a way to determine how similar people are in their tastes. You do this by comparing each person with every other person and calculating a similarity score. There are a few ways to do this, and in this section I'll show you two systems for calculating similarity scores: Euclidean distance and Pearson correlation.
One very simple way to calculate a similarity score is to use a Euclidean distance score, which takes the items that people have ranked in common and uses them as axes for a chart. You can then plot the people on the chart and see how close together they are, as shown in Figure 2-1.
Figure 2-1. People in preference space
This figure shows the people charted in preference space. Toby has been plotted at 4.5 on the Snakes axis and at 1.0 on the Dupree axis. The closer two people are in the preference space, the more similar their preferences are. Because the chart is two-dimensional, you can only look at two rankings at a time, but the principle is the same for bigger sets of rankings.
To calculate the distance between Toby and LaSalle in the chart,
take the difference in each axis, square them and add them together,
then take the square root of the sum. In Python, you can use the
pow(n,2)
function to square a
number and take the square root with the sqrt
function:
>>from math ...