Data wrangling

If you have some experience working on data of some sort, you will recollect that most of the time data needs to be preprocessed so that we can further use it as part of a bigger analysis. This process is called data wrangling.

Let's see what the typical flow in this process looks like:

  • Data acquisition
  • Data structure analysis
  • Information extraction
  • Unwanted data removal
  • Data transformation
  • Data standardization

Let's try to understand these in detail.

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.