Replacing missing values with the mean

When you disregard cases with any missing variables, you lose useful information that the nonmissing values in that case convey. You may sometimes want to impute reasonable values (those that will not skew the results of analyses very much) for the missing values.

Getting ready

Download the missing-data.csv file and store it in your R environment's working directory.

How to do it...

Read data and replace missing values:

> dat <- read.csv("missing-data.csv", na.strings = "")
> dat$Income.imp.mean <- ifelse(is.na(dat$Income), mean(dat$Income, na.rm=TRUE), dat$Income)

After this, all the NA values for Income will now be the mean value prior to imputation.

How it works...

The preceding ifelse() function returns the imputed ...

Get R: Recipes for Analysis, Visualization and Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.