Imputing missing values

After detecting the number of missing values within each attribute, we have to impute the missing values since they might have a significant effect on the conclusions that can be drawn from the data.

Getting ready

This recipe will require train.data loaded in the R session and have the previous recipe completed by converting Pclass and Survived to a factor type.

How to do it...

Perform the following steps to impute the missing values:

  1. First, list the distribution of Port of Embarkation. Here, we add the useNA = "always" argument to show the number of NA values contained within train.data:
    > table(train.data$Embarked, useNA = "always")
    
       C    Q    S <NA> 
     168   77  644    2 
    
  2. Assign the two missing values to a more probable port (that is, ...

Get R: Recipes for Analysis, Visualization and Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.