The EM algorithm for the imputation of missing values

The EM algorithm is extensively used for the imputation of missing values. Implementations include (van Buuren and Groothuis-Oudshoorn 2011), (Schafer 1997), (Templ, Alfons, and Filzmoser 2011), (Raghunathan et al. 2001), and (Gelman and Hill 2011). In the following we want to show how an EM algorithm works generally for these kind of problems.

First we take a data set to impute. We select again the sleep data:

library("MASS")
library("robustbase")
library("VIM")
data("sleep")
str(sleep)
## 'data.frame':    62 obs. of  10 variables:
##  $ BodyWgt : num  6654 1 3.38 0.92 2547 ...
##  $ BrainWgt: num  5712 6.6 44.5 5.7 4603 ...
##  $ NonD    : num  NA 6.3 NA NA 2.1 9.1 15.8 5.2 10.9 8.3 ...
## $ Dream : num ...

Get Simulation for Data Science with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.