ChapterÂ 5.Â (Re)Organizing the Webâs Data

Adam Laiacano

The first, and sometimes hardest part of doing any data analysis is acquiring the data from which you hope to extract information. Whether you want to look at your personal spending habits, calculate your next trade in fantasy baseball, or compare a politicianâs investment returns to your own, the data you need is usually there on the web with some sense of order to it, but itâs probably not in a form thatâs very useful for analysis. If this is the case, youâll need to either manually gather the data or write a script to collect the data for you.

The granddaddy of all data formats is the data table, with a column for each attribute and a row for each observation. Youâve seen this if youâve ever used Microsoft Excel, relational databases, or Râs data.frame object.

TableÂ 5-1.Â An example data table

Date	Blog	Posts
2012-01-01	adamlaiacano	2
2012-01-01	david	4
2012-01-01	dallas	6
2012-01-02	adamlaiacano	0
2012-01-02	david	4
2012-01-02	dallas	6

Most websites store their data behind the scenes in tables within relational databases, and if those tables were accessible to the computing public, this chapter of Bad Data Handbook wouldnât need to exist. However, itâs a web designerâs job to make this information visually appealing and interpretable, which usually means theyâll only present the reader with a relevant subset of the dataset, such as a single companyâs stock price over a specific date range, or recent status updates from ...

Get Bad Data Handbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Bad Data Handbook by Q. Ethan McCallum

ChapterÂ 5.Â (Re)Organizing the Webâs Data

Adam Laiacano

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

ChapterÂ 5.Â (Re)Organizing the Webâs Data

Adam Laiacano

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

ChapterÂ 5.Â (Re)Organizing the Webâs Data