Chapter 11. Exploring Data with Pandas

In the previous chapter, we cleaned the Nobel Prize dataset that we scraped from Wikipedia in Chapter 6. Now it’s time to start exploring our shiny new dataset, looking for interesting patterns, stories to tell, and anything else that could form the basis for an interesting visualization.

First off, let’s try to clear our minds and take a long, hard look at the data to hand to get a broad idea of the visualizations suggested. Example 11-1 shows the form of the Nobel dataset, with categorical, temporal, and geographical data.

Example 11-1. Our cleaned Nobel Prize dataset
[{
 'category': u'Physiology or Medicine',
 'date_of_birth': u'8 October 1927',
 'date_of_death': u'24 March 2002',
 'gender': 'male',
 'link': u'http://en.wikipedia.org/wiki/C%C3%A9sar_Milstein',
 'name': u'C\xe9sar Milstein',
 'country': u'Argentina',
 'place_of_birth': u'Bah\xeda Blanca ,  Argentina',
 'place_of_death': u'Cambridge , England',
 'year': 1984
 },
 ...
 }]

The data in Example 11-1 suggests a number of stories we might want to investigate, among them:

  • Gender disparities among the prize winners

  • National trends (e.g., which country has most prizes in Economics)

  • Details about individual winners, such as their average age on receiving the prize or life expectancy

  • Geographical journey from place of birth to adopted country using the born_in and country fields

These investigative lines form the basis for the coming sections, which will probe the dataset by asking questions ...

Get Data Visualization with Python and JavaScript now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.