Best practices for data handling

Data cleaning and manipulation constitutes the framework of any analytics project. To ensure that this important step is executed efficiently, the following best practices should be executed:

  • After importing the dataset, one should ensure that the dataset (all the variables and rows) has been read correctly. This means reading all the variables in their correct or required format. Sometimes, due to some limitation on the data or the IDE side, some variables are read wrongly and they need to be formatted to the correct format.
  • For example, if a variable reports some numerical ID (let's say 10-digits long), many a times it would be read and displayed in a scientific notation. However, this would be wrong as it is an ...

Get Python: Data Analytics and Visualization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.