Identifying the topic of an article

Counting words is a very popular and simple technique that normally renders good results if you want to get a feeling for the topic of the body of text. In this recipe, we will show you how to count the words from The Seattle Times article we have been working with so far to identify the topic of the article without even reading it.

Getting ready

To execute this recipe, you will need NLTK, the regular expressions module from Python, NumPy, and Matplotlib. No other prerequisites are required.

How to do it…

The beginning of the code for this recipe is very similar to the one presented in the previous recipe so we will present only the relevant parts (the nlp_countWords.py file):

# part-of-speech tagging tagged_sentences ...

Get Practical Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.