Chapter 14. Exploring and Visualizing

Once you’ve imported your data and cleaned and transformed it into a suitable state, you get to start asking questions like “what does it all mean?” The two main tools at your disposal are summary statistics and plots. (Modeling comes later, because you need to understand your data before you can model it properly.) R is well served by a comprehensive set of functions for calculating statistics, and a choice of three different graphics systems.

Chapter Goals

After reading this chapter, you should:

  • Be able to calculate a range of summary statistics on numeric data
  • Be able to draw standard plots in R’s three plotting systems
  • Be able to manipulate those plots in simple ways

Summary Statistics

We’ve already come across many of the functions for calculating summary statistics, so this section is partly a recap. Most are fairly obvious in their naming and their usage; for example, mean and median calculate their respective measures of location. There isn’t a function for the mode, but it can be calculated from the results of the table function, which gives counts of each element. (If you haven’t already, have a go at Exercise 13-3 now.)

In the following examples, the obama_vs_mccain dataset contains the fractions of people voting for Obama and McCain in the 2008 US presidential elections, along with some contextual background information on demographics:

data(obama_vs_mccain, package = "learningr")
obama <- obama_vs_mccain$Obama
mean(obama)
## [1] 51.29 ...

Get Learning R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.