Imputing missing values
After detecting the number of missing values within each attribute, we have to impute the missing values since they might have a significant effect on the conclusions that can be drawn from the data.
Getting ready
This recipe will require train.data
loaded in the R session and have the previous recipe completed by converting Pclass
and Survived
to a factor type.
How to do it...
Perform the following steps to impute the missing values:
- First, list the distribution of Port of Embarkation. Here, we add the
useNA = "always"
argument to show the number of NA values contained withintrain.data
:> table(train.data$Embarked, useNA = "always") C Q S <NA> 168 77 644 2
- Assign the two missing values to a more probable port (that is, ...
Get R: Recipes for Analysis, Visualization and Machine Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.