Cover by Toby Segaran

Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo

Basic Linear Classification

This is one of the simplest classifiers to construct, but it's a good basis for further work. It works by finding the average of all the data in each class and constructing a point that represents the center of the class. It can then classify new points by determining to which center point they are closest.

To do this, you'll first need a function that calculates the average point in the classes. In this case, the classes are just 0 and 1. Add lineartrain to advancedclassify.py:

def lineartrain(rows):
  averages={}
  counts={}

  for row in rows:
    # Get the class of this point
    cl=row.match

    averages.setdefault(cl,[0.0]*(len(row.data)))
    counts.setdefault(cl,0)

    # Add this point to the averages
    for i in range(len(row.data)):
      averages[cl][i]+=float(row.data[i])

    # Keep track of how many points in each class
    counts[cl]+=1

  # Divide sums by counts to get the averages
  for cl,avg in averages.items(  ):
    for i in range(len(avg)):
      avg[i]/=counts[cl]

  return averages

You can run this function in your Python session to get the averages:

>>>reload(advancedclassify)
<module 'advancedclassify' from 'advancedclassify.pyc'>
>>> avgs=advancedclassify.lineartrain(agesonly)

To see why this is useful, consider again the plot of the age data, shown in Figure 9-4.

Linear classifier using averages

Figure 9-4. Linear classifier using averages

The Xs in the figure represent the average points as calculated by lineartrain. The line ...

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required