You are previewing Graphical Data Analysis with R.
O'Reilly logo
Graphical Data Analysis with R

Book Description

See How Graphics Reveal Information

Graphical Data Analysis with R shows you what information you can gain from graphical displays. The book focuses on why you draw graphics to display data and which graphics to draw (and uses R to do so). All the datasets are available in R or one of its packages and the R code is available at rosuda.org/GDA.

Graphical data analysis is useful for data cleaning, exploring data structure, detecting outliers and unusual groups, identifying trends and clusters, spotting local patterns, evaluating modelling output, and presenting results. This book guides you in choosing graphics and understanding what information you can glean from them. It can be used as a primary text in a graphical data analysis course or as a supplement in a statistics course. Colour graphics are used throughout.

Table of Contents

  1. Preliminaries
  2. Preface
    1. Acknowledgements
  3. Chapter 1 Setting the Scene
    1. 1.1 Graphics in action
    2. 1.2 Introduction
    3. 1.3 What is Graphical Data Analysis (GDA)?
      1. The Iris dataset
      2. Student Admissions at UC Berkeley dataset
      3. Pima Indians diabetes dataset
      4. GDA in context
    4. 1.4 Using this book, the R code in it, and the book’s webpage
    5. Main points
    6. Exercises
      1. Figure 1.1
      2. Figure 1.2
      3. Figure 1.3
      4. Figure 1.4
      5. Figure 1.5
      6. Figure 1.6
      7. Figure 1.7
      8. Figure 1.8
      9. Figure 1.9
  4. Chapter 2 Brief Review of the Literature and Background Materials
    1. Summary
    2. 2.1 Literature review
    3. 2.2 Interactive graphics
    4. 2.3 Other graphics software
    5. 2.4 Websites
    6. 2.5 Datasets
    7. 2.6 Statistical texts
  5. Chapter 3 Examining Continuous Variables
    1. Summary
    2. 3.1 Introduction
    3. 3.2 What features might continuous variables have?
    4. 3.3 Looking for features
      1. Galton's heights
      2. Some more heights—Pearson
      3. Scottish hill races (best times)
      4. How are the variables in the Boston dataset distributed?
      5. Hidalgo stamps thickness
      6. How long is a movie?
    5. 3.4 Comparing distributions by subgroups
    6. 3.5 What plots are there for individual continuous variables?
    7. 3.6 Plot options
    8. 3.7 Modelling and testing for continuous variables
    9. Main points
    10. Exercises
      1. Figure 3.1
      2. Figure 3.2
      3. Figure 3.3
      4. Figure 3.4
      5. Figure 3.5
      6. Figure 3.6
      7. Figure 3.7
      8. Figure 3.8
      9. Figure 3.9
      10. Figure 3.10
      11. Figure 3.11
      12. Figure 3.12
      13. Figure 3.13
      14. Figure 3.14
  6. Chapter 4 Displaying Categorical Data
    1. Summary
    2. 4.1 Introduction
    3. 4.2 What features might categorical variables have?
    4. 4.3 Nominal data—no fixed category order
      1. Meta analyses—how big was each study?
      2. Anorexia
      3. Who sailed on the Titanic?
      4. Opinion polls
    5. 4.4 Ordinal data—fixed category order
      1. Surveys
      2. And more surveys
    6. 4.5 Discrete data—counts and integers Deaths by horsekicks
      1. Goals in soccer
      2. Benford’s Law
    7. 4.6 Formats, factors, estimates, and barcharts
      1. Shape of the dataset
      2. Coding of variables
      3. Estimates shown as bars
    8. 4.7 Modelling and testing for categorical variables
    9. Main points
    10. Exercises
      1. Figure 4.1
      2. Figure 4.2
      3. Figure 4.3
      4. Figure 4.4
      5. Figure 4.5
      6. Figure 4.6
      7. Figure 4.7
      8. Figure 4.8
      9. Figure 4.9
      10. Figure 4.10
      11. Figure 4.11
      12. Figure 4.12
  7. Chapter 5 Looking for Structure: Dependency Relationships and Associations
    1. Summary
    2. 5.1 Introduction
    3. 5.2 What features might be visible in scatterplots?
    4. 5.3 Looking at pairs of continuous variables
      1. The evils of drink?
      2. Old Faithful
      3. Movie ratings
    5. 5.4 Adding models: lines and smooths
      1. Cars and mpg
      2. Pearson heights
    6. 5.5 Comparing groups within scatterplots
    7. 5.6 Scatterplot matrices for looking at many pairs of variables
      1. Crime in the U.S.
      2. Swiss banknotes
      3. Functions for drawing sploms
    8. 5.7 Scatterplot options
    9. 5.8 Modelling and testing for relationships between variables
    10. Main points
    11. Exercises
      1. Figure 5.1
      2. Figure 5.2
      3. Figure 5.3
      4. Figure 5.4
      5. Figure 5.5
      6. Figure 5.6
      7. Figure 5.7
      8. Figure 5.8
      9. Figure 5.9
      10. Figure 5.10
      11. Figure 5.11
      12. Figure 5.12
      13. Figure 5.13
      1. Table 5.1
  8. Chapter 6 Investigating Multivariate Continuous Data
    1. Summary
    2. 6.1 Introduction
    3. 6.2 What is a parallel coordinate plot (pcp)?
      1. Functions for drawing pcp's
    4. 6.3 Features you can see with parallel coordinate plots
    5. 6.4 Interpreting clustering results
    6. 6.5 Parallel coordinate plots and time series
    7. 6.6 Parallel coordinate plots for indices
    8. 6.7 Options for parallel coordinate plots
      1. Alignment
      2. Scaling
      3. Outliers
      4. Variable order
      5. Formatting
    9. 6.8 Modelling and testing for multivariate continuous data
    10. 6.9 Parallel coordinate plots and comparing model results
    11. Main points
    12. Exercises
      1. Figure 6.1
      2. Figure 6.2
      3. Figure 6.3
      4. Figure 6.4
      5. Figure 6.5
      6. Figure 6.6
      7. Figure 6.7
      8. Figure 6.8
      9. Figure 6.9
      10. Figure 6.10
      11. Figure 6.11
      12. Figure 6.12
      13. Figure 6.13
      14. Figure 6.14
      15. Figure 6.15
      16. Figure 6.16
      17. Figure 6.17
      18. Figure 6.18
      19. Figure 6.19
      1. Table 6.1
  9. Chapter 7 Studying Multivariate Categorical Data
    1. Summary
    2. 7.1 Introduction
    3. 7.2 Data on the sinking of the Titanic
    4. 7.3 What is a mosaicplot?
    5. 7.4 Different mosaicplots for different questions of interest
    6. 7.5 Which mosaicplot is the right one?
    7. 7.6 Additional options
    8. 7.7 Modelling and testing for multivariate categorical data
    9. Main points
    10. Exercises
      1. Figure 7.1
      2. Figure 7.2
      3. Figure 7.3
      4. Figure 7.4
      5. Figure 7.5
      6. Figure 7.6
      7. Figure 7.7
      8. Figure 7.8
      9. Figure 7.9
      10. Figure 7.10
      11. Figure 7.11
      12. Figure 7.12
  10. Chapter 8 Getting an Overview
    1. Summary
    2. 8.1 Introduction
    3. 8.2 Many individual displays
    4. 8.3 Multivariate overviews
      1. Scatterplot matrices
      2. Parallel coordinate plots
      3. Heatmaps
      4. Glyphs
    5. 8.4 Multivariate overviews for categorical variables
    6. 8.5 Graphics by group
      1. Trellis graphics
      2. Group plots
    7. 8.6 Modelling and testing for overviews
    8. Main points
    9. Exercises
      1. Figure 8.1
      2. Figure 8.2
      3. Figure 8.3
      4. Figure 8.4
      5. Figure 8.5
      6. Figure 8.6
      7. Figure 8.7
      8. Figure 8.8
      9. Figure 8.9
      10. Figure 8.10
      11. Figure 8.11
      12. Figure 8.12
      13. Figure 8.13
  11. Chapter 9 Graphics and Data Quality: How Good Are the Data?
    1. Summary
    2. 9.1 Introduction
    3. 9.2 Missing values
      1. Visualising patterns of missing values
      2. Missings dependent on values of other variables (MAR)
      3. Reasons for missings and dealing with missings
    4. 9.3 Outliers
      1. What is an outlier?
      2. Examples of outliers
      3. Univariate outliers
      4. Multivariate outliers
      5. Categorical outliers
      6. Dealing with outliers
      7. A possible strategy for outliers
    5. 9.4 Modelling and testing for data quality
    6. Main points
    7. Exercises
      1. Figure 9.1
      2. Figure 9.2
      3. Figure 9.3
      4. Figure 9.4
      5. Figure 9.5
      6. Figure 9.6
      7. Figure 9.7
      8. Figure 9.8
      9. Figure 9.9
      10. Figure 9.10
      11. Figure 9.11
      12. Figure 9.12
  12. Chapter 10 Comparisons, Comparisons, Comparisons
    1. Summary
    2. 10.1 Introduction
    3. 10.2 Making comparisons
      1. Types of comparison
      2. Comparing like with like
    4. 10.3 Making visual comparisons
      1. Comparing to a standard
      2. Comparing new data with old data
      3. Comparing subgroups
      4. Comparing time series (Playfair's import/export data)
    5. 10.4 Comparing group effects graphically
    6. 10.5 Comparing rates visually
    7. 10.6 Graphics for comparing many subsets
    8. 10.7 Graphics principles for comparisons
    9. 10.8 Modelling and testing for comparisons
    10. Main points
    11. Exercises
      1. Figure 10.1
      2. Figure 10.2
      3. Figure 10.3
      4. Figure 10.4
      5. Figure 10.5
      6. Figure 10.6
      7. Figure 10.7
      8. Figure 10.8
      9. Figure 10.9
      10. Figure 10.10
      11. Figure 10.11
      12. Figure 10.12
      13. Figure 10.13
      14. Figure 10.14
  13. Chapter 11 Graphics for Time Series
    1. Summary
    2. 11.1 Introduction
    3. 11.2 Graphics for a single time series
    4. 11.3 Multiple series
      1. Related series for the same population
      2. Same series for different subgroups
      3. Series with different scales
      4. One plot versus many
    5. 11.4 Special features of time series
      1. Data definitions
      2. Length of time series
      3. Regular and irregular time series
      4. Time series of different kinds of variables
      5. Forecasting
      6. Seeing patterns
    6. 11.5 Alternative graphics for time series
    7. 11.6 R classes and packages for time series
    8. 11.7 Modelling and testing time series
    9. Main points
    10. Exercises
      1. Figure 11.1
      2. Figure 11.2
      3. Figure 11.3
      4. Figure 11.4
      5. Figure 11.5
      6. Figure 11.6
      7. Figure 11.7
      8. Figure 11.8
      9. Figure 11.9
      10. Figure 11.10
  14. Chapter 12 Ensemble Graphics and Case Studies
    1. Summary
    2. 12.1 Introduction
    3. 12.2 What is an ensemble of graphics?
    4. 12.3 Combining different views—a case study example
    5. 12.4 Case studies
      1. Moral statistics of France
      2. Airbags and car accidents
      3. Athletes’ blood measurements
      4. Marijuana arrests
      5. Crohn’s disease
      6. Footballers in the four major European leagues
      7. Decathlon
      8. Intermission
      1. Figure 12.1
      2. Figure 12.2
      3. Figure 12.3
      4. Figure 12.4
      5. Figure 12.5
  15. Chapter 13 Some Notes on Graphics with R
    1. Summary
    2. 13.1 Graphics systems in R
    3. 13.2 Loading datasets and packages for graphical analysis
    4. 13.3 Graphics conventions in statistics
    5. 13.4 What is a graphic anyway?
    6. 13.5 Options for all graphics
      1. Window size and shape
      2. Scales
      3. Text
      4. Colour and appearance
    7. 13.6 Some R graphics advice and coding tips
      1. To get a new graphics window
      2. Resizing windows
      3. Default plots
      4. Points in scatterplots or in other point plots
      5. Printing graphics
      6. Multiple windows
      7. Drawing several independent plots in one window
      8. Naming objects
      9. Reordering categories for a barchart (and ordering in general)
      10. Reshaping datasets and graphics
      11. Missing values
      12. Using the code and finding out about function options
    8. 13.7 Other graphics
    9. 13.8 Large datasets
    10. 13.9 Perfecting graphics
      1. Figure 13.1
      2. Figure 13.2
  16. Chapter 14 Summary
    1. 14.1 Data analysis and graphics
    2. 14.2 Key features of GDA
    3. 14.3 Strengths and weaknesses of GDA
    4. 14.4 Recommendations for GDA
  17. References