Two interesting aspects of this dataset are the nonlinearity and
the interplay of the variables. If you installed
*matplotlib* (http://matplotlib.sourceforge.net) in Chapter 8 you can visualize some of the
variables using `advancedclassify`

and
generating a couple of lists from it. (This step is not necessary to
work through the rest of the chapter.) Try this in your Python
session:

from pylab import * def plotagematches(rows): xdm,ydm=[r.data[0] for r in rows if r.match==1],\ [r.data[1] for r in rows if r.match==1] xdn,ydn=[r.data[0] for r in rows if r.match==0],\ [r.data[1] for r in rows if r.match==0] plot(xdm,ydm,'go') plot(xdn,ydn,'ro') show( )

Call this method from your Python session:

>>><module 'advancedclassify' from 'advancedclassify.py'> >>>`reload(advancedclassify)`

`advancedclassify.plotagematches(agesonly)`

This will generate a *scatter plot* of the
man's age versus the woman's age. The points will be O if the people are
a match and X if they are not. You'll get a window like the one shown in
Figure 9-1.

Figure 9-1. Generated age-age scatter plot

Although there are obviously many other factors that determine whether two people are a match, this figure is based on the simplified age-only dataset, and it shows an obvious boundary that indicates people do not go far outside their own age range. The boundary also appears to curve and become less defined as people ...

Start Free Trial

No credit card required