Identifying missing data

The easiest way of dealing with missing values, especially with MCAR data, is simply removing all the observations with any missing values. If we want to exclude every row of a matrix or data.frame object which has at least one missing value, we can use the complete.cases function from the stats package to identify those.

For a quick start, let's see how many rows have at least one missing value:

> library(hflights)
> table(complete.cases(hflights))
 FALSE   TRUE 
  3622 223874

This is around 1.5 percent of the quarter million rows:

> prop.table(table(complete.cases(hflights))) * 100
    FALSE      TRUE 
 1.592116 98.407884

Let's see what the distribution of NA looks like within different columns:

> sort(sapply(hflights, function(x) ...

Get Mastering Data Analysis with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.