Detecting and removing missing values

Missing values are values that should have been recorded but, for some reason, weren't actually recorded. Those values are different, from values without meaning, represented in R with NaN (not a number).

Most of us understood missing values due to circumstances such as the following one:

> x <- c(1,2,3,NA,4)
> mean(x)
[1] NA

"Oh come on, I know you can do it. Just ignore that useless NA" was probably your reaction, or at least it was mine.

Fortunately, R comes packed with good functions for missing value detection and handling.

In this recipe and the following one, we will see two opposite approaches to missing value handling:

  • Removing missing values
  • Simulating missing values by interpolation

I have to warn you ...

Get RStudio for R Statistical Computing Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.