Detecting missing values
Missing values reduce the representativeness of the sample, and furthermore, might distort inferences about the population. This recipe will focus on detecting missing values within the Titanic dataset.
Getting ready
You need to have completed the previous recipes by the Pclass
attribute and Survived
to a factor type.
In R, a missing value is noted with the symbol NA (not available), and an impossible value is NaN (not a number).
How to do it...
Perform the following steps to detect the missing value:
- The
is.na
function is used to denote which index of the attribute contains the NA value. Here, we apply it to theAge
attribute first:> is.na(train.data$Age)
- The
is.na
function indicates the missing value of theAge
attribute. ...
Get R: Recipes for Analysis, Visualization and Machine Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.