O'Reilly logo

Programming Collective Intelligence by Toby Segaran

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Previous Approaches

In previous chapters, you've looked at different ways of dealing with word counts for textual data. For purposes of comparison, it's useful to try these first and see what sort of results you get, then compare them with the results of feature extraction. If you have the code that you wrote for those chapters, you can import those modules and try them here on your feeds. If not, don't worry—this section illustrates how these methods work on the sample data.

Bayesian Classification

Bayesian classification is, as you've seen, a supervised learning method. If you were to try to use the classifier built in Chapter 6, you would first be required to classify several examples of stories to train the classifier. The classifier would then be able to put later stories into your predefined categories. Besides the obvious downside of having to do the initial training, this approach also suffers from the limitation that the developer has to decide what all the different categories are. All the classifiers you've seen so far, such as decision trees and support-vector machines, will have this same limitation when applied to a dataset of this kind.

If you'd like to try the Bayesian classifier on this dataset, you'll need to place the module you built in Chapter 6 in your working directory. You can use the articlewords dictionary as is for the feature set of each article.

Try this in your Python session:

>>>def wordmatrixfeatures(x):
...     return [wordvec[w] for w in range(len(x)) if ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required