You are previewing Visualizing Data.

Visualizing Data

Cover of Visualizing Data by Ben Fry Published by O'Reilly Media, Inc.
  1. Visualizing Data
  2. Preface
    1. The Audience for This Book
    2. Background Information
    3. Overview of the Book
    4. Safari® Books Online
    5. Acknowledgments
    6. Conventions Used in This Book
    7. Using Code Examples
    8. We'd Like to Hear from You
  3. 1. The Seven Stages of Visualizing Data
    1. Why Data Display Requires Planning
      1. Too Much Information
      2. Data Collection
      3. Thinking About Data
      4. Data Never Stays the Same
      5. What Is the Question?
      6. A Combination of Many Disciplines
      7. Process
    2. An Example
      1. What Is the Question?
    3. Iteration and Combination
    4. Principles
      1. Each Project Has Unique Requirements
      2. Avoid the All-You-Can-Eat Buffet
      3. Know Your Audience
    5. Onward
  4. 2. Getting Started with Processing
    1. Sketching with Processing
      1. Hello World
      2. Hello Mouse
    2. Exporting and Distributing Your Work
      1. Saving Your Work
    3. Examples and Reference
      1. More About the size( ) Method
      2. Loading and Displaying Data
    4. Functions
      1. Libraries Add New Features
    5. Sketching and Scripting
      1. Don't Start by Trying to Build a Cathedral
    6. Ready?
  5. 3. Mapping
    1. Drawing a Map
      1. Explanation of the Processing Code
    2. Locations on a Map
    3. Data on a Map
      1. Two-Sided Data Ranges
      2. Provide More Information with a Mouse Rollover (Interact)
      3. Updating Values over Time (Acquire, Mine)
      4. Smooth Interpolation of Values over Time (Refine)
    4. Using Your Own Data
      1. Taking Data from the User
    5. Next Steps
  6. 4. Time Series
    1. Milk, Tea, and Coffee (Acquire and Parse)
    2. Cleaning the Table (Filter and Mine)
    3. A Simple Plot (Represent and Refine)
    4. Labeling the Current Data Set (Refine and Interact)
    5. Drawing Axis Labels (Refine)
      1. Year Labels
      2. Labeling Volume on the Vertical Axis
      3. Bringing It All Together and Titling Both Axes
    6. Choosing a Proper Representation (Represent and Refine)
    7. Using Rollovers to Highlight Points (Interact)
    8. Ways to Connect Points (Refine)
      1. Showing Data As an Area
      2. Further Refinements and Erasing Elements
      3. Discrete Values with a Bar Chart (Represent)
    9. Text Labels As Tabbed Panes (Interact)
      1. Adding the Necessary Variables
      2. Drawing Tabs Instead of a Single Title
      3. Handling Mouse Input
      4. Better Tab Images (Refine)
    10. Interpolation Between Data Sets (Interact)
    11. End of the Series
  7. 5. Connections and Correlations
    1. Changing Data Sources
    2. Problem Statement
    3. Preprocessing
      1. Retrieving Win/Loss Data (Acquire)
      2. Unpacking the Win/Loss files (Mine and Filter)
      3. Retrieving Team Logos (Acquire, Refine)
      4. Retrieving Salary Data (Acquire, Parse, Filter)
    4. Using the Preprocessed Data (Acquire, Parse, Filter, Mine)
      1. Team Names and Codes
      2. Team Salaries
      3. Win-Loss Standings
      4. Team Logos
      5. Finishing the Setup
    5. Displaying the Results (Represent)
    6. Returning to the Question (Refine)
      1. Highlighting the Lines
      2. A Better Typeface for Numeric Data
      3. A Word About Typography
    7. Sophisticated Sorting: Using Salary As a Tiebreaker (Mine)
    8. Moving to Multiple Days (Interact)
      1. Drawing the Dates
      2. Load Standings for the Entire Season
      3. Switching Between Dates
      4. Checking Our Progress
    9. Smoothing Out the Interaction (Refine)
    10. Deployment Considerations (Acquire, Parse, Filter)
  8. 6. Scatterplot Maps
    1. Preprocessing
      1. Data from the U.S. Census Bureau (Acquire)
      2. Dealing with the Zip Code Database File (Parse and Filter)
      3. Building the Preprocessor
    2. Loading the Data (Acquire and Parse)
    3. Drawing a Scatterplot of Zip Codes (Mine and Represent)
    4. Highlighting Points While Typing (Refine and Interact)
    5. Show the Currently Selected Point (Refine)
    6. Progressively Dimming and Brightening Points (Refine)
    7. Zooming In (Interact)
    8. Changing How Points Are Drawn When Zooming (Refine)
    9. Deployment Issues (Acquire and Refine)
    10. Next Steps
  9. 7. Trees, Hierarchies, and Recursion
    1. Using Recursion to Build a Directory Tree
      1. Caveats When Dealing with Files (Filter)
      2. Recursively Printing Tree Contents (Represent)
    2. Using a Queue to Load Asynchronously (Interact)
      1. Showing Progress (Represent)
    3. An Introduction to Treemaps
      1. A Simple Treemap Library
      2. A Simple Treemap Example
    4. Which Files Are Using the Most Space?
      1. Reading the Directory Structure (Acquire, Parse, Filter, Mine, Represent)
    5. Viewing Folder Contents (Interact)
    6. Improving the Treemap Display (Refine)
      1. Maintaining Context (Refine)
      2. Making Colors More Useful (Mine, Refine)
    7. Flying Through Files (Interact)
      1. Updating FileItem for zoom
      2. Updating FolderItem
      3. Adding a Folder Selection Dialog (Interact)
    8. Next Steps
  10. 8. Networks and Graphs
    1. Simple Graph Demo
      1. Porting from Java to Processing
      2. Interacting with Nodes
    2. A More Complicated Graph
      1. Using Text As Input (Acquire)
      2. Reading a Book (Parse)
      3. Removing Stop Words (Filter)
      4. Smarter Addition of Nodes and Edges (Mine)
      5. Viewing the Book (Represent and Refine)
      6. Saving an Image in a Vector Format
      7. Checking Our Work
    3. Approaching Network Problems
    4. Advanced Graph Example
      1. Getting Started with Java IDEs
      2. Obtaining a Web Server Logfile (Acquire)
      3. Reading Apache Logfiles (Parse)
      4. A Look at the Other Source Files
      5. Moving from Processing to Java
      6. Reading and Cleaning the Data (Acquire, Parse, Filter)
      7. Bringing It All Together (Mine and Represent)
      8. Depicting Branches and Nodes (Represent and Refine)
      9. Playing with Data (Interact)
      10. Drawing Node Names (Represent and Refine)
      11. Drawing Visitor Paths (Represent and Refine)
    5. Mining Additional Information
  11. 9. Acquiring Data
    1. Where to Find Data
      1. Data Acquisition Ethics
    2. Tools for Acquiring Data from the Internet
      1. Wget and cURL
      2. NcFTP and Links
    3. Locating Files for Use with Processing
      1. The Data Folder
      2. Uniform Resource Locator (URL)
      3. Absolute Path to a Local File
      4. Specifying Output Locations
    4. Loading Text Data
      1. Files Too Large for loadStrings( )
      2. Reading Files Progressively
      3. Reading Files Asynchronously with a Thread
      4. Parsing Large Files As They Are Acquired
    5. Dealing with Files and Folders
      1. Using the Java File Object to Locate Files
    6. Listing Files in a Folder
      1. Handling Numbered File Sequences
    7. Asynchronous Image Downloads
    8. Using openStream( ) As a Bridge to Java
    9. Dealing with Byte Arrays
    10. Advanced Web Techniques
      1. Handling Web Forms
      2. Pretending to Be a Web Browser
    11. Using a Database
      1. Getting Started with MySQL
      2. Using MySQL with Processing
      3. Other Database Options
      4. Performance Aspects of Databases in Interactive Applications
    12. Dealing with a Large Number of Files
  12. 10. Parsing Data
    1. Levels of Effort
    2. Tools for Gathering Clues
    3. Text Is Best
      1. Tab-Separated Values (TSV)
      2. Comma-Separated Values (CSV)
      3. Text with Fixed Column Widths
    4. Text Markup Languages
      1. HyperText Markup Language (HTML)
      2. Extensible Markup Language (XML)
      3. JavaScript Object Notation (JSON)
    5. Regular Expressions (regexps)
    6. Grammars and BNF Notation
    7. Compressed Data
      1. GZIP Streams (GZ)
    8. Vectors and Geometry
      1. Scalable Vector Graphics (SVG)
      2. OBJ and AutoCAD DXF
      3. PostScript (PS) and Portable Document Format (PDF)
      4. Shapefile and Well-Known Text
    9. Binary Data Formats
      1. Excel Spreadsheets (XLS)
      2. dBASE/xBase (DBF)
      3. Arbitrary Binary Formats
      4. Bit Shifting
      5. DataInputStream
    10. Advanced Detective Work
      1. Watching Network Traffic
  13. 11. Integrating Processing with Java
    1. Programming Modes
      1. Basic
      2. Continuous
      3. Java
    2. Additional Source Files (Tabs)
      1. Using .java Source Files
    3. The Preprocessor
    4. API Structure
      1. Event Handling
      2. The size( ) Method
      3. The main( ) Method
      4. The frame Object
    5. Embedding PApplet into Java Applications
      1. Two Models for Updating the Screen
      2. Embedding in a Swing Application
    6. Using Java Code in a Processing Sketch
      1. Using the Code Folder to Add .jar Files to a Sketch
      2. Packaging Code into Libraries
    7. Using Libraries
    8. Building with the Source for processing.core
  14. Bibliography
    1. Acquire and Parse
  15. Index
  16. About the Author
  17. Colophon
  18. Copyright
O'Reilly logo

Chapter 10. Parsing Data

Parsing converts a raw stream of data into a structure that can be manipulated in software. Lots of parsing is detective work, requiring you to spend time looking at files or data streams to figure out what's inside. The data might be available in an easily parsed format (such as an RSS feed in XML format) or in a proprietary binary format. This chapter covers some of the methods used to store data, methods for reading common data formats, and some detective procedures for dissecting data. Even if your particular data format is not covered in this chapter, the methods discussed are applicable to any data source.

Parsing may also seem to be quite disconnected from the actual process of data visualization. However, it's part of the process for a reason: chances are, you'll have to obtain data from a source that's not under your control and will spend a lot of time figuring out how to use the data that you're given. This chapter aims to give you a sense of how files are typically structured because more likely than not, the data you acquire will be poorly documented (if it's documented at all). Being able to recognize the basic file format, or even whether the data is compressed, are valuable clues to unpacking unknown information.

Generally, data boils down to lists (one-dimensional sets), matrices (two-dimensional tables, such as a spreadsheet), or trees and graphs (individual "nodes" of data and sets of "edges" that describe connections between them). Strictly ...

The best content for your career. Discover unlimited learning on demand for around $1/day.