O'Reilly logo

Programming Collective Intelligence by Toby Segaran

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Displaying the Results

Exactly how you view the results is a little complicated. Every feature in the features matrix has a weighting that indicates how strongly each word applies to that feature, so you can try displaying the top five or ten words in each feature to see what the most important words are in that feature. The equivalent column in the weights matrix tells you how much this particular feature applies to each of the articles, so it's also interesting to show the top three articles and see how strongly this feature applies to all of them.

Add a new function called showfeatures to newsfeatures.py:

from numpy import * def showfeatures(w,h,titles,wordvec,out='features.txt'): outfile=file(out,'w') pc,wc=shape(h) toppatterns=[[] for i in range(len(titles))] patternnames=[] # Loop over all the features for i in range(pc): slist=[] # Create a list of words and their weights for j in range(wc): slist.append((h[i,j],wordvec[j])) # Reverse sort the word list slist.sort( ) slist.reverse( ) # Print the first six elements n=[s[1] for s in slist[0:6]] outfile.write(str(n)+'\n') patternnames.append(n) # Create a list of articles for this feature flist=[] for j in range(len(titles)): # Add the article with its weight flist.append((w[j,i],titles[j])) toppatterns[j].append((w[j,i],i,titles[j])) # Reverse sort the list flist.sort( ) flist.reverse( ) # Show the top 3 articles for f in flist[0:3]: outfile.write(str(f)+'\n') outfile.write('\n') outfile.close( ) # Return the pattern names for ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required