This is one of the simplest classifiers to construct, but it's a good basis for further work. It works by finding the average of all the data in each class and constructing a point that represents the center of the class. It can then classify new points by determining to which center point they are closest.
To do this, you'll first need a function that calculates the
average point in the classes. In this case, the
classes are just 0 and 1. Add lineartrain
to advancedclassify.py:
def lineartrain(rows): averages={} counts={} for row in rows: # Get the class of this point cl=row.match averages.setdefault(cl,[0.0]*(len(row.data))) counts.setdefault(cl,0) # Add this point to the averages for i in range(len(row.data)): averages[cl][i]+=float(row.data[i]) # Keep track of how many points in each class counts[cl]+=1 # Divide sums by counts to get the averages for cl,avg in averages.items( ): for i in range(len(avg)): avg[i]/=counts[cl] return averages
You can run this function in your Python session to get the averages:
>>>reload(advancedclassify)
<module 'advancedclassify' from 'advancedclassify.pyc'> >>>avgs=advancedclassify.lineartrain(agesonly)
To see why this is useful, consider again the plot of the age data, shown in Figure 9-4.
Figure 9-4. Linear classifier using averages
The Xs in the figure represent the average points as calculated by
lineartrain
. The line ...