O'Reilly logo

Machine Learning with R Cookbook by Chiu Yu-Wei

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Imputing missing values

After detecting the number of missing values within each attribute, we have to impute the missing values since they might have a significant effect on the conclusions that can be drawn from the data.

Getting ready

This recipe will require train.data loaded in the R session and have the previous recipe completed by converting Pclass and Survived to a factor type.

How to do it...

Perform the following steps to impute the missing values:

  1. First, list the distribution of Port of Embarkation. Here, we add the useNA = "always" argument to show the number of NA values contained within train.data:
    > table(train.data$Embarked, useNA = "always")
    
       C    Q    S <NA> 
     168   77  644    2 
    
  2. Assign the two missing values to a more probable port (that is, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required