Working with missing data

In this section, we will discuss missing, NaN, or null values, in pandas data structures. It is a very common situation to arrive with missing data in an object. One such case that creates missing data is reindexing:

>>> df8 = pd.DataFrame(np.arange(12).reshape(4,3),  
                       columns=['a', 'b', 'c'])
   a   b   c
0  0   1   2
1  3   4   5
2  6   7   8
3  9  10  11
>>> df9 = df8.reindex(columns = ['a', 'b', 'c', 'd'])
   a   b   c   d
0  0   1   2 NaN
1  3   4   5 NaN
2  6   7   8 NaN
4  9  10  11 NaN
>>> df10 = df8.reindex([3, 2, 'a', 0])
    a   b   c
3   9  10  11
2   6   7   8
a NaN NaN NaN
0   0   1   2

To manipulate missing values, we can use the isnull() or notnull() functions to detect the missing values in a Series object, as well as in a DataFrame object:

>>> df10.isnull()
       a      b      c
3 False False ...

Get Python: Real-World Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.