This is one of the simplest classifiers to construct, but it's a good basis for further work. It works by finding the average of all the data in each class and constructing a point that represents the center of the class. It can then classify new points by determining to which center point they are closest.

To do this, you'll first need a function that calculates the
*average point* in the classes. In this case, the
classes are just 0 and 1. Add `lineartrain`

to *advancedclassify.py*:

def lineartrain(rows): averages={} counts={} for row in rows: # Get the class of this point cl=row.match averages.setdefault(cl,[0.0]*(len(row.data))) counts.setdefault(cl,0) # Add this point to the averages for i in range(len(row.data)): averages[cl][i]+=float(row.data[i]) # Keep track of how many points in each class counts[cl]+=1 # Divide sums by counts to get the averages for cl,avg in averages.items( ): for i in range(len(avg)): avg[i]/=counts[cl] return averages

You can run this function in your Python session to get the averages:

>>><module 'advancedclassify' from 'advancedclassify.pyc'> >>>`reload(advancedclassify)`

`avgs=advancedclassify.lineartrain(agesonly)`

To see why this is useful, consider again the plot of the age data, shown in Figure 9-4.

Figure 9-4. Linear classifier using averages

The Xs in the figure represent the average points as calculated by
`lineartrain`

. The line ...

Start Free Trial

No credit card required