Missing values NA

Missing values in dataframes are a real source of irritation because they affect the way that model-fitting functions operate and they can greatly reduce the power of the modelling that we would like to do.

Some functions do not work with their default settings when there are missing values in the data, and mean is a classic example of this:

x<-c(1:8,NA)

mean(x)

[1] NA

In order to calculate the mean of the non-missing values, you need to specify that the NA are to be removed, using the na.rm=TRUE argument:

mean(x,na.rm=T)

[1] 4.5

To check for the location of missing values within a vector, use the function is.na(x) rather than x !="NA". Here is an example where we want to find the locations (7 and 8) of missing values within a vector called vmv:

vmv<-c(1:6,NA,NA,9:12)

vmv

[1] 1 2 3 4 5 6 NA NA 9 10 11 12

Making an index of the missing values in an array could use the seq function,

seq(along=vmv)[is.na(vmv)]

[1] 7 8

but the result is achieved more simply using which like this:

which(is.na(vmv))

[1] 7 8

If the missing values are genuine counts of zero, you might want to edit the NA to 0. Use the is.na function to generate subscripts for this

vmv[is.na(vmv)]<- 0

vmv

[1] 1 2 3 4 5 6 0 0 9 10 11 12

or use the ifelse function like this

vmv<-c(1:6,NA,NA,9:12)

ifelse(is.na(vmv),0,vmv)

[1] 1 2 3 4 5 6 0 0 9 10 11 12

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.