Chapter 9. Acquiring Data

The first step in visualizing data is to load it into your application. Typical data sources might be a file on a disk, a stream from a network, or a digitized signal (e.g., audio or sensor readings). Unless you own the data and it’s recorded in a definable, digitizable format, things can get messy quickly. How do you process weeks of surveillance video? How does one quantitatively acquire data from an hour-long meeting that involved a verbal discussion, drawings on a whiteboard, and note taking done by individual participants?

Thus, the acquisition stage covers several tasks that sometimes get complicated:

  • Unless you are generating your own data, you have to find a good source for the data you want.

  • If you don’t own the data, you have to make sure you have the right to use it.

  • You may have to go through contortions to extract the data from a web page or other source that wasn’t set up to make it easy for your application.

  • You have to download the data, which may present difficulties if the volume is large, especially if it’s fast-changing.

I’ll show some common solutions to these problems in this chapter. Even if they don’t fit your situation, they’ll still be a starting point for finding a solution.

In some cases, you may not use a Processing program to acquire and parse your initial data set. It’s not uncommon to preprocess the data in another language, such as Perl, Python, or Ruby, and later use the (cleaned) results with Processing. Simple integration can ...

Get Visualizing Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.