Dealing with missing data

First, let's look at the missing codes for different languages:

Languages
Missing code
Explanation or examples

R

NA

NA stands for Not Available

Python

nan

import scipy as sp

misingCode=sp.nan

Jullia

missing

julia> missing + 5

missing

Octave

NaN

Same for MATLAB as well

Table 3.7: Missing codes for R, Python, Julia, and Octave

For R, the missing code is NA. Here are several functions we could use to remove those missing observations, shown in an example:

> head(na_example,20) 
[1]  2  1  3  2  1  3  1  4  3  2  2 NA  2  2  1  4 NA  1  1  2 
> length(na_example) 
[1] 1000 
> x<-na.exclude(na_example) 
> length(x) 
[1] 855 
> head(x,20) 
[1] 2 1 3 2 1 3 1 4 3 2 2 2 2 1 4 1 1 2 1 2 

In the previous ...

Get Hands-On Data Science with Anaconda now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.