Detecting missing values

Missing values reduce the representativeness of the sample, and furthermore, might distort inferences about the population. This recipe will focus on detecting missing values within the Titanic dataset.

Getting ready

You need to have completed the previous recipes by the Pclass attribute and Survived to a factor type.

In R, a missing value is noted with the symbol NA (not available), and an impossible value is NaN (not a number).

How to do it...

Perform the following steps to detect the missing value:

  1. The is.na function is used to denote which index of the attribute contains the NA value. Here, we apply it to the Age attribute first:
    > is.na(train.data$Age)
    
  2. The is.na function indicates the missing value of the Age attribute. ...

Get R: Recipes for Analysis, Visualization and Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.