You are previewing Visualizing Data.

Visualizing Data

Cover of Visualizing Data by Ben Fry Published by O'Reilly Media, Inc.
  1. Visualizing Data
  2. Preface
    1. The Audience for This Book
    2. Background Information
    3. Overview of the Book
    4. Safari® Books Online
    5. Acknowledgments
    6. Conventions Used in This Book
    7. Using Code Examples
    8. We'd Like to Hear from You
  3. 1. The Seven Stages of Visualizing Data
    1. Why Data Display Requires Planning
      1. Too Much Information
      2. Data Collection
      3. Thinking About Data
      4. Data Never Stays the Same
      5. What Is the Question?
      6. A Combination of Many Disciplines
      7. Process
    2. An Example
      1. What Is the Question?
    3. Iteration and Combination
    4. Principles
      1. Each Project Has Unique Requirements
      2. Avoid the All-You-Can-Eat Buffet
      3. Know Your Audience
    5. Onward
  4. 2. Getting Started with Processing
    1. Sketching with Processing
      1. Hello World
      2. Hello Mouse
    2. Exporting and Distributing Your Work
      1. Saving Your Work
    3. Examples and Reference
      1. More About the size( ) Method
      2. Loading and Displaying Data
    4. Functions
      1. Libraries Add New Features
    5. Sketching and Scripting
      1. Don't Start by Trying to Build a Cathedral
    6. Ready?
  5. 3. Mapping
    1. Drawing a Map
      1. Explanation of the Processing Code
    2. Locations on a Map
    3. Data on a Map
      1. Two-Sided Data Ranges
      2. Provide More Information with a Mouse Rollover (Interact)
      3. Updating Values over Time (Acquire, Mine)
      4. Smooth Interpolation of Values over Time (Refine)
    4. Using Your Own Data
      1. Taking Data from the User
    5. Next Steps
  6. 4. Time Series
    1. Milk, Tea, and Coffee (Acquire and Parse)
    2. Cleaning the Table (Filter and Mine)
    3. A Simple Plot (Represent and Refine)
    4. Labeling the Current Data Set (Refine and Interact)
    5. Drawing Axis Labels (Refine)
      1. Year Labels
      2. Labeling Volume on the Vertical Axis
      3. Bringing It All Together and Titling Both Axes
    6. Choosing a Proper Representation (Represent and Refine)
    7. Using Rollovers to Highlight Points (Interact)
    8. Ways to Connect Points (Refine)
      1. Showing Data As an Area
      2. Further Refinements and Erasing Elements
      3. Discrete Values with a Bar Chart (Represent)
    9. Text Labels As Tabbed Panes (Interact)
      1. Adding the Necessary Variables
      2. Drawing Tabs Instead of a Single Title
      3. Handling Mouse Input
      4. Better Tab Images (Refine)
    10. Interpolation Between Data Sets (Interact)
    11. End of the Series
  7. 5. Connections and Correlations
    1. Changing Data Sources
    2. Problem Statement
    3. Preprocessing
      1. Retrieving Win/Loss Data (Acquire)
      2. Unpacking the Win/Loss files (Mine and Filter)
      3. Retrieving Team Logos (Acquire, Refine)
      4. Retrieving Salary Data (Acquire, Parse, Filter)
    4. Using the Preprocessed Data (Acquire, Parse, Filter, Mine)
      1. Team Names and Codes
      2. Team Salaries
      3. Win-Loss Standings
      4. Team Logos
      5. Finishing the Setup
    5. Displaying the Results (Represent)
    6. Returning to the Question (Refine)
      1. Highlighting the Lines
      2. A Better Typeface for Numeric Data
      3. A Word About Typography
    7. Sophisticated Sorting: Using Salary As a Tiebreaker (Mine)
    8. Moving to Multiple Days (Interact)
      1. Drawing the Dates
      2. Load Standings for the Entire Season
      3. Switching Between Dates
      4. Checking Our Progress
    9. Smoothing Out the Interaction (Refine)
    10. Deployment Considerations (Acquire, Parse, Filter)
  8. 6. Scatterplot Maps
    1. Preprocessing
      1. Data from the U.S. Census Bureau (Acquire)
      2. Dealing with the Zip Code Database File (Parse and Filter)
      3. Building the Preprocessor
    2. Loading the Data (Acquire and Parse)
    3. Drawing a Scatterplot of Zip Codes (Mine and Represent)
    4. Highlighting Points While Typing (Refine and Interact)
    5. Show the Currently Selected Point (Refine)
    6. Progressively Dimming and Brightening Points (Refine)
    7. Zooming In (Interact)
    8. Changing How Points Are Drawn When Zooming (Refine)
    9. Deployment Issues (Acquire and Refine)
    10. Next Steps
  9. 7. Trees, Hierarchies, and Recursion
    1. Using Recursion to Build a Directory Tree
      1. Caveats When Dealing with Files (Filter)
      2. Recursively Printing Tree Contents (Represent)
    2. Using a Queue to Load Asynchronously (Interact)
      1. Showing Progress (Represent)
    3. An Introduction to Treemaps
      1. A Simple Treemap Library
      2. A Simple Treemap Example
    4. Which Files Are Using the Most Space?
      1. Reading the Directory Structure (Acquire, Parse, Filter, Mine, Represent)
    5. Viewing Folder Contents (Interact)
    6. Improving the Treemap Display (Refine)
      1. Maintaining Context (Refine)
      2. Making Colors More Useful (Mine, Refine)
    7. Flying Through Files (Interact)
      1. Updating FileItem for zoom
      2. Updating FolderItem
      3. Adding a Folder Selection Dialog (Interact)
    8. Next Steps
  10. 8. Networks and Graphs
    1. Simple Graph Demo
      1. Porting from Java to Processing
      2. Interacting with Nodes
    2. A More Complicated Graph
      1. Using Text As Input (Acquire)
      2. Reading a Book (Parse)
      3. Removing Stop Words (Filter)
      4. Smarter Addition of Nodes and Edges (Mine)
      5. Viewing the Book (Represent and Refine)
      6. Saving an Image in a Vector Format
      7. Checking Our Work
    3. Approaching Network Problems
    4. Advanced Graph Example
      1. Getting Started with Java IDEs
      2. Obtaining a Web Server Logfile (Acquire)
      3. Reading Apache Logfiles (Parse)
      4. A Look at the Other Source Files
      5. Moving from Processing to Java
      6. Reading and Cleaning the Data (Acquire, Parse, Filter)
      7. Bringing It All Together (Mine and Represent)
      8. Depicting Branches and Nodes (Represent and Refine)
      9. Playing with Data (Interact)
      10. Drawing Node Names (Represent and Refine)
      11. Drawing Visitor Paths (Represent and Refine)
    5. Mining Additional Information
  11. 9. Acquiring Data
    1. Where to Find Data
      1. Data Acquisition Ethics
    2. Tools for Acquiring Data from the Internet
      1. Wget and cURL
      2. NcFTP and Links
    3. Locating Files for Use with Processing
      1. The Data Folder
      2. Uniform Resource Locator (URL)
      3. Absolute Path to a Local File
      4. Specifying Output Locations
    4. Loading Text Data
      1. Files Too Large for loadStrings( )
      2. Reading Files Progressively
      3. Reading Files Asynchronously with a Thread
      4. Parsing Large Files As They Are Acquired
    5. Dealing with Files and Folders
      1. Using the Java File Object to Locate Files
    6. Listing Files in a Folder
      1. Handling Numbered File Sequences
    7. Asynchronous Image Downloads
    8. Using openStream( ) As a Bridge to Java
    9. Dealing with Byte Arrays
    10. Advanced Web Techniques
      1. Handling Web Forms
      2. Pretending to Be a Web Browser
    11. Using a Database
      1. Getting Started with MySQL
      2. Using MySQL with Processing
      3. Other Database Options
      4. Performance Aspects of Databases in Interactive Applications
    12. Dealing with a Large Number of Files
  12. 10. Parsing Data
    1. Levels of Effort
    2. Tools for Gathering Clues
    3. Text Is Best
      1. Tab-Separated Values (TSV)
      2. Comma-Separated Values (CSV)
      3. Text with Fixed Column Widths
    4. Text Markup Languages
      1. HyperText Markup Language (HTML)
      2. Extensible Markup Language (XML)
      3. JavaScript Object Notation (JSON)
    5. Regular Expressions (regexps)
    6. Grammars and BNF Notation
    7. Compressed Data
      1. GZIP Streams (GZ)
    8. Vectors and Geometry
      1. Scalable Vector Graphics (SVG)
      2. OBJ and AutoCAD DXF
      3. PostScript (PS) and Portable Document Format (PDF)
      4. Shapefile and Well-Known Text
    9. Binary Data Formats
      1. Excel Spreadsheets (XLS)
      2. dBASE/xBase (DBF)
      3. Arbitrary Binary Formats
      4. Bit Shifting
      5. DataInputStream
    10. Advanced Detective Work
      1. Watching Network Traffic
  13. 11. Integrating Processing with Java
    1. Programming Modes
      1. Basic
      2. Continuous
      3. Java
    2. Additional Source Files (Tabs)
      1. Using .java Source Files
    3. The Preprocessor
    4. API Structure
      1. Event Handling
      2. The size( ) Method
      3. The main( ) Method
      4. The frame Object
    5. Embedding PApplet into Java Applications
      1. Two Models for Updating the Screen
      2. Embedding in a Swing Application
    6. Using Java Code in a Processing Sketch
      1. Using the Code Folder to Add .jar Files to a Sketch
      2. Packaging Code into Libraries
    7. Using Libraries
    8. Building with the Source for processing.core
  14. Bibliography
    1. Acquire and Parse
  15. Index
  16. About the Author
  17. Colophon
  18. Copyright
O'Reilly logo

Chapter 9. Acquiring Data

The first step in visualizing data is to load it into your application. Typical data sources might be a file on a disk, a stream from a network, or a digitized signal (e.g., audio or sensor readings). Unless you own the data and it's recorded in a definable, digitizable format, things can get messy quickly. How do you process weeks of surveillance video? How does one quantitatively acquire data from an hour-long meeting that involved a verbal discussion, drawings on a whiteboard, and note taking done by individual participants?

Thus, the acquisition stage covers several tasks that sometimes get complicated:

  • Unless you are generating your own data, you have to find a good source for the data you want.

  • If you don't own the data, you have to make sure you have the right to use it.

  • You may have to go through contortions to extract the data from a web page or other source that wasn't set up to make it easy for your application.

  • You have to download the data, which may present difficulties if the volume is large, especially if it's fast-changing.

I'll show some common solutions to these problems in this chapter. Even if they don't fit your situation, they'll still be a starting point for finding a solution.

In some cases, you may not use a Processing program to acquire and parse your initial data set. It's not uncommon to preprocess the data in another language, such as Perl, Python, or Ruby, and later use the (cleaned) results with Processing. Simple integration can ...

The best content for your career. Discover unlimited learning on demand for around $1/day.