Replacing missing data with another value

In the following dataset, we have three valid values: 2, 3, and 4. Obviously, their mean is 3. Since there are two NAs, we plan to replace them with the mean, that is, 3 in this case. The following R code achieves this:

> x<-c(NA,2,3,4,NA) 
> y<-na.omit(x) 
> m<-mean(y) 
> m 
[1] 3 
> x[is.na(x)]<-m 
> x 
[1] 3 2 3 4 3 
> 

For Python, see the following program:

import scipy as sp 
import pandas as pd 
df = pd.DataFrame({'A' : [2,sp.nan,3,4]}) 
print(df) 
df.fillna(df.mean(), inplace=True) 
print(df) 

The related output is:

     A 
0  2.0 
1  NaN 
2  3.0 
3  4.0 
     A 
0  2.0 
1  3.0 
2  3.0 
3  4.0

Get Hands-On Data Science with Anaconda now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.