Cover by Ben Fry

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo

Chapter 10. Parsing Data

Parsing converts a raw stream of data into a structure that can be manipulated in software. Lots of parsing is detective work, requiring you to spend time looking at files or data streams to figure out what's inside. The data might be available in an easily parsed format (such as an RSS feed in XML format) or in a proprietary binary format. This chapter covers some of the methods used to store data, methods for reading common data formats, and some detective procedures for dissecting data. Even if your particular data format is not covered in this chapter, the methods discussed are applicable to any data source.

Parsing may also seem to be quite disconnected from the actual process of data visualization. However, it's part of the process for a reason: chances are, you'll have to obtain data from a source that's not under your control and will spend a lot of time figuring out how to use the data that you're given. This chapter aims to give you a sense of how files are typically structured because more likely than not, the data you acquire will be poorly documented (if it's documented at all). Being able to recognize the basic file format, or even whether the data is compressed, are valuable clues to unpacking unknown information.

Generally, data boils down to lists (one-dimensional sets), matrices (two-dimensional tables, such as a spreadsheet), or trees and graphs (individual "nodes" of data and sets of "edges" that describe connections between them). Strictly ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required