Cover by Toby Segaran

Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo

Difficulties with the Data

Two interesting aspects of this dataset are the nonlinearity and the interplay of the variables. If you installed matplotlib (http://matplotlib.sourceforge.net) in Chapter 8 you can visualize some of the variables using advancedclassify and generating a couple of lists from it. (This step is not necessary to work through the rest of the chapter.) Try this in your Python session:

from pylab import *
def plotagematches(rows):
  xdm,ydm=[r.data[0] for r in rows if r.match==1],\
          [r.data[1] for r in rows if r.match==1]
  xdn,ydn=[r.data[0] for r in rows if r.match==0],\
          [r.data[1] for r in rows if r.match==0]

  plot(xdm,ydm,'go')
  plot(xdn,ydn,'ro')

  show(  )

Call this method from your Python session:

>>>reload(advancedclassify)
<module 'advancedclassify' from 'advancedclassify.py'>
>>> advancedclassify.plotagematches(agesonly)

This will generate a scatter plot of the man's age versus the woman's age. The points will be O if the people are a match and X if they are not. You'll get a window like the one shown in Figure 9-1.

Generated age-age scatter plot

Figure 9-1. Generated age-age scatter plot

Although there are obviously many other factors that determine whether two people are a match, this figure is based on the simplified age-only dataset, and it shows an obvious boundary that indicates people do not go far outside their own age range. The boundary also appears to curve and become less defined as people ...

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required