Two interesting aspects of this dataset are the nonlinearity and
the interplay of the variables. If you installed
matplotlib (http://matplotlib.sourceforge.net) in Chapter 8 you can visualize some of the
generating a couple of lists from it. (This step is not necessary to
work through the rest of the chapter.) Try this in your Python
from pylab import * def plotagematches(rows): xdm,ydm=[r.data for r in rows if r.match==1],\ [r.data for r in rows if r.match==1] xdn,ydn=[r.data for r in rows if r.match==0],\ [r.data for r in rows if r.match==0] plot(xdm,ydm,'go') plot(xdn,ydn,'ro') show( )
Call this method from your Python session:
reload(advancedclassify)<module 'advancedclassify' from 'advancedclassify.py'> >>>
This will generate a scatter plot of the man's age versus the woman's age. The points will be O if the people are a match and X if they are not. You'll get a window like the one shown in Figure 9-1.
Figure 9-1. Generated age-age scatter plot
Although there are obviously many other factors that determine whether two people are a match, this figure is based on the simplified age-only dataset, and it shows an obvious boundary that indicates people do not go far outside their own age range. The boundary also appears to curve and become less defined as people ...