O'Reilly logo

Python: Advanced Predictive Analytics by Joseph Babcock, Ashish Kumar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Handling missing values

Checking for missing values and handling them properly is an important step in the data preparation process, if they are left untreated they can:

  • Lead to the behavior between the variables not being analyzed correctly
  • Lead to incorrect interpretation and inference from the data

To see how; move up a few pages to see how the describe method is explained. Look at the output table; why are the counts for many of the variables different from each other? There are 1310 rows in the dataset, as we saw earlier in the section. Why is it then that the count is 1046 for age, 1309 for pclass, and 121 for body. This is because the dataset doesn't have a value for 264 (1310-1046) entries in the age column, 1 (1310-1309) entry in the pclass ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required