You are previewing Visualizing Data.
O'Reilly logo
Visualizing Data

Book Description

Enormous quantities of data go unused or underused today, simply because people can't visualize the quantities and relationships in it. Using a downloadable programming environment developed by the author, Visualizing Data demonstrates methods for representing data accurately on the Web and elsewhere, complete with user interaction, animation, and more. How do the 3.1 billion A, C, G and T letters of the human genome compare to those of a chimp or a mouse? What do the paths that millions of visitors take through a web site look like? With Visualizing Data, you learn how to answer complex questions like these with thoroughly interactive displays. We're not talking about cookie-cutter charts and graphs. This book teaches you how to design entire interfaces around large, complex data sets with the help of a powerful new design and prototyping tool called "Processing". Used by many researchers and companies to convey specific data in a clear and understandable manner, the Processing beta is available free. With this tool and Visualizing Data as a guide, you'll learn basic visualization principles, how to choose the right kind of display for your purposes, and how to provide interactive features that will bring users to your site over and over. This book teaches you:

  • The seven stages of visualizing data -- acquire, parse, filter, mine, represent, refine, and interact

  • How all data problems begin with a question and end with a narrative construct that provides a clear answer without extraneous details

  • Several example projects with the code to make them work

  • Positive and negative points of each representation discussed. The focus is on customization so that each one best suits what you want to convey about your data set

The book does not provide ready-made "visualizations" that can be plugged into any data set. Instead, with chapters divided by types of data rather than types of display, you'll learn how each visualization conveys the unique properties of the data it represents -- why the data was collected, what's interesting about it, and what stories it can tell. Visualizing Data teaches you how to answer questions, not simply display information.

Table of Contents

  1. Visualizing Data
  2. Preface
    1. The Audience for This Book
    2. Background Information
    3. Overview of the Book
    4. SafariĀ® Books Online
    5. Acknowledgments
    6. Conventions Used in This Book
    7. Using Code Examples
    8. We'd Like to Hear from You
  3. 1. The Seven Stages of Visualizing Data
    1. Why Data Display Requires Planning
      1. Too Much Information
      2. Data Collection
      3. Thinking About Data
      4. Data Never Stays the Same
      5. What Is the Question?
      6. A Combination of Many Disciplines
      7. Process
    2. An Example
      1. What Is the Question?
        1. Acquire
        2. Parse
        3. Filter
        4. Mine
        5. Represent
        6. Refine
        7. Interact
    3. Iteration and Combination
    4. Principles
      1. Each Project Has Unique Requirements
      2. Avoid the All-You-Can-Eat Buffet
      3. Know Your Audience
    5. Onward
  4. 2. Getting Started with Processing
    1. Sketching with Processing
      1. Hello World
      2. Hello Mouse
    2. Exporting and Distributing Your Work
      1. Saving Your Work
    3. Examples and Reference
      1. More About the size( ) Method
      2. Loading and Displaying Data
    4. Functions
      1. Libraries Add New Features
    5. Sketching and Scripting
      1. Don't Start by Trying to Build a Cathedral
    6. Ready?
  5. 3. Mapping
    1. Drawing a Map
      1. Explanation of the Processing Code
    2. Locations on a Map
    3. Data on a Map
      1. Two-Sided Data Ranges
      2. Provide More Information with a Mouse Rollover (Interact)
      3. Updating Values over Time (Acquire, Mine)
      4. Smooth Interpolation of Values over Time (Refine)
    4. Using Your Own Data
      1. Taking Data from the User
    5. Next Steps
  6. 4. Time Series
    1. Milk, Tea, and Coffee (Acquire and Parse)
    2. Cleaning the Table (Filter and Mine)
    3. A Simple Plot (Represent and Refine)
    4. Labeling the Current Data Set (Refine and Interact)
    5. Drawing Axis Labels (Refine)
      1. Year Labels
      2. Labeling Volume on the Vertical Axis
      3. Bringing It All Together and Titling Both Axes
    6. Choosing a Proper Representation (Represent and Refine)
    7. Using Rollovers to Highlight Points (Interact)
    8. Ways to Connect Points (Refine)
      1. Showing Data As an Area
      2. Further Refinements and Erasing Elements
      3. Discrete Values with a Bar Chart (Represent)
    9. Text Labels As Tabbed Panes (Interact)
      1. Adding the Necessary Variables
      2. Drawing Tabs Instead of a Single Title
      3. Handling Mouse Input
      4. Better Tab Images (Refine)
    10. Interpolation Between Data Sets (Interact)
    11. End of the Series
  7. 5. Connections and Correlations
    1. Changing Data Sources
    2. Problem Statement
    3. Preprocessing
      1. Retrieving Win/Loss Data (Acquire)
        1. Data source for baseball statistics
      2. Unpacking the Win/Loss files (Mine and Filter)
        1. Introducing regular expressions
      3. Retrieving Team Logos (Acquire, Refine)
      4. Retrieving Salary Data (Acquire, Parse, Filter)
    4. Using the Preprocessed Data (Acquire, Parse, Filter, Mine)
      1. Team Names and Codes
      2. Team Salaries
      3. Win-Loss Standings
      4. Team Logos
      5. Finishing the Setup
    5. Displaying the Results (Represent)
    6. Returning to the Question (Refine)
      1. Highlighting the Lines
      2. A Better Typeface for Numeric Data
      3. A Word About Typography
    7. Sophisticated Sorting: Using Salary As a Tiebreaker (Mine)
    8. Moving to Multiple Days (Interact)
      1. Drawing the Dates
      2. Load Standings for the Entire Season
      3. Switching Between Dates
      4. Checking Our Progress
    9. Smoothing Out the Interaction (Refine)
    10. Deployment Considerations (Acquire, Parse, Filter)
  8. 6. Scatterplot Maps
    1. Preprocessing
      1. Data from the U.S. Census Bureau (Acquire)
      2. Dealing with the Zip Code Database File (Parse and Filter)
      3. Building the Preprocessor
        1. What about a binary data file or a database?
    2. Loading the Data (Acquire and Parse)
    3. Drawing a Scatterplot of Zip Codes (Mine and Represent)
    4. Highlighting Points While Typing (Refine and Interact)
    5. Show the Currently Selected Point (Refine)
    6. Progressively Dimming and Brightening Points (Refine)
    7. Zooming In (Interact)
    8. Changing How Points Are Drawn When Zooming (Refine)
    9. Deployment Issues (Acquire and Refine)
    10. Next Steps
  9. 7. Trees, Hierarchies, and Recursion
    1. Using Recursion to Build a Directory Tree
      1. Caveats When Dealing with Files (Filter)
      2. Recursively Printing Tree Contents (Represent)
    2. Using a Queue to Load Asynchronously (Interact)
      1. Showing Progress (Represent)
    3. An Introduction to Treemaps
      1. A Simple Treemap Library
      2. A Simple Treemap Example
    4. Which Files Are Using the Most Space?
      1. Reading the Directory Structure (Acquire, Parse, Filter, Mine, Represent)
    5. Viewing Folder Contents (Interact)
    6. Improving the Treemap Display (Refine)
      1. Maintaining Context (Refine)
      2. Making Colors More Useful (Mine, Refine)
    7. Flying Through Files (Interact)
      1. Updating FileItem for zoom
      2. Updating FolderItem
      3. Adding a Folder Selection Dialog (Interact)
    8. Next Steps
  10. 8. Networks and Graphs
    1. Simple Graph Demo
      1. Porting from Java to Processing
      2. Interacting with Nodes
    2. A More Complicated Graph
      1. Using Text As Input (Acquire)
      2. Reading a Book (Parse)
      3. Removing Stop Words (Filter)
      4. Smarter Addition of Nodes and Edges (Mine)
      5. Viewing the Book (Represent and Refine)
      6. Saving an Image in a Vector Format
      7. Checking Our Work
    3. Approaching Network Problems
    4. Advanced Graph Example
      1. Getting Started with Java IDEs
        1. Step-by-step instructions if you're new to Eclipse
      2. Obtaining a Web Server Logfile (Acquire)
      3. Reading Apache Logfiles (Parse)
      4. A Look at the Other Source Files
      5. Moving from Processing to Java
        1. Helpful additions in Java 1.5 (J2SE 5.0) and later
      6. Reading and Cleaning the Data (Acquire, Parse, Filter)
        1. Filtering site addresses and aliases
        2. Filtering for useful page information
      7. Bringing It All Together (Mine and Represent)
        1. Mining unused nodes: Maintaining performance and readability
      8. Depicting Branches and Nodes (Represent and Refine)
      9. Playing with Data (Interact)
      10. Drawing Node Names (Represent and Refine)
      11. Drawing Visitor Paths (Represent and Refine)
    5. Mining Additional Information
  11. 9. Acquiring Data
    1. Where to Find Data
      1. Data Acquisition Ethics
    2. Tools for Acquiring Data from the Internet
      1. Wget and cURL
      2. NcFTP and Links
    3. Locating Files for Use with Processing
      1. The Data Folder
      2. Uniform Resource Locator (URL)
      3. Absolute Path to a Local File
      4. Specifying Output Locations
    4. Loading Text Data
      1. Files Too Large for loadStrings( )
      2. Reading Files Progressively
      3. Reading Files Asynchronously with a Thread
      4. Parsing Large Files As They Are Acquired
    5. Dealing with Files and Folders
      1. Using the Java File Object to Locate Files
    6. Listing Files in a Folder
        1. Listing files with a filter class
        2. Sorting file lists
      1. Handling Numbered File Sequences
    7. Asynchronous Image Downloads
    8. Using openStream( ) As a Bridge to Java
    9. Dealing with Byte Arrays
    10. Advanced Web Techniques
      1. Handling Web Forms
      2. Pretending to Be a Web Browser
    11. Using a Database
      1. Getting Started with MySQL
      2. Using MySQL with Processing
      3. Other Database Options
      4. Performance Aspects of Databases in Interactive Applications
    12. Dealing with a Large Number of Files
  12. 10. Parsing Data
    1. Levels of Effort
    2. Tools for Gathering Clues
    3. Text Is Best
      1. Tab-Separated Values (TSV)
      2. Comma-Separated Values (CSV)
      3. Text with Fixed Column Widths
    4. Text Markup Languages
      1. HyperText Markup Language (HTML)
        1. Embedding Tidy into a sketch
        2. Is a parser necessary?
        3. Using Swing's built-in HTML parser
        4. Parsing and manipulating tables from HTML files
        5. Other HTML parser libraries
        6. Writing a custom HTML parser
      2. Extensible Markup Language (XML)
        1. Cleaning up XML
        2. Example: Using the Processing XML library to read geocoding data
        3. Other methods for parsing XML
      3. JavaScript Object Notation (JSON)
    5. Regular Expressions (regexps)
    6. Grammars and BNF Notation
    7. Compressed Data
      1. GZIP Streams (GZ)
        1. PKZip files (ZIP)
        2. Other compression formats
    8. Vectors and Geometry
      1. Scalable Vector Graphics (SVG)
      2. OBJ and AutoCAD DXF
      3. PostScript (PS) and Portable Document Format (PDF)
      4. Shapefile and Well-Known Text
    9. Binary Data Formats
      1. Excel Spreadsheets (XLS)
      2. dBASE/xBase (DBF)
      3. Arbitrary Binary Formats
      4. Bit Shifting
      5. DataInputStream
    10. Advanced Detective Work
      1. Watching Network Traffic
  13. 11. Integrating Processing with Java
    1. Programming Modes
      1. Basic
      2. Continuous
      3. Java
    2. Additional Source Files (Tabs)
      1. Using .java Source Files
    3. The Preprocessor
    4. API Structure
      1. Event Handling
      2. The size( ) Method
      3. The main( ) Method
      4. The frame Object
    5. Embedding PApplet into Java Applications
      1. Two Models for Updating the Screen
      2. Embedding in a Swing Application
    6. Using Java Code in a Processing Sketch
      1. Using the Code Folder to Add .jar Files to a Sketch
      2. Packaging Code into Libraries
    7. Using Libraries
    8. Building with the Source for processing.core
  14. Bibliography
    1. Acquire and Parse
  15. Index
  16. About the Author
  17. Colophon
  18. Copyright