Identifying missing data
The easiest way of dealing with missing values, especially with MCAR data, is simply removing all the observations with any missing values. If we want to exclude every row of a matrix
or data.frame
object which has at least one missing value, we can use the complete.cases
function from the stats
package to identify those.
For a quick start, let's see how many rows have at least one missing value:
> library(hflights) > table(complete.cases(hflights)) FALSE TRUE 3622 223874
This is around 1.5 percent of the quarter million rows:
> prop.table(table(complete.cases(hflights))) * 100 FALSE TRUE 1.592116 98.407884
Let's see what the distribution of NA
looks like within different columns:
> sort(sapply(hflights, function(x) ...
Get Mastering Data Analysis with R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.