Exactly how you view the results is a little complicated. Every feature in the features matrix has a weighting that indicates how strongly each word applies to that feature, so you can try displaying the top five or ten words in each feature to see what the most important words are in that feature. The equivalent column in the weights matrix tells you how much this particular feature applies to each of the articles, so it's also interesting to show the top three articles and see how strongly this feature applies to all of them.
Add a new function called
showfeatures to newsfeatures.py:
from numpy import * def showfeatures(w,h,titles,wordvec,out='features.txt'): outfile=file(out,'w') pc,wc=shape(h) toppatterns=[ for i in range(len(titles))] patternnames= # Loop over all the features for i in range(pc): slist= # Create a list of words and their weights for j in range(wc): slist.append((h[i,j],wordvec[j])) # Reverse sort the word list slist.sort( ) slist.reverse( ) # Print the first six elements n=[s for s in slist[0:6]] outfile.write(str(n)+'\n') patternnames.append(n) # Create a list of articles for this feature flist= for j in range(len(titles)): # Add the article with its weight flist.append((w[j,i],titles[j])) toppatterns[j].append((w[j,i],i,titles[j])) # Reverse sort the list flist.sort( ) flist.reverse( ) # Show the top 3 articles for f in flist[0:3]: outfile.write(str(f)+'\n') outfile.write('\n') outfile.close( ) # Return the pattern names for ...