Working with missing data

Data is "missing" in pandas when it has a value of NaN (also seen as np.nan—the form from NumPy). The NaN value represents that in a particular Series that there is not a value specified for the particular index label.

In pandas, there are a number of reasons why a value can be NaN:

  • A join of two sets of data does not have matched values
  • Data that you retrieved from an external source is incomplete
  • The NaN value is not known at a given point in time and will be filled in later
  • There is a data collection error retrieving a value, but the event must still be recorded in the index
  • Reindexing of data has resulted in an index that does not have a value
  • The shape of data has changed and there are now additional rows or columns, which ...

Get Learning pandas now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.