O'Reilly logo
  • Lex Pedersen thinks this is interesting:

Data munging” is an unusual term used to describe the part of a data science project involving the transformation of a data set into a form more suitable for machine learning algorithms. Data munging constitutes one of the primary ingredients of the “data pipeline,” the series of processing steps required to take raw data and transform it for use in a production system. The task involves cleansing, converting, manipulating, parsing, filtering, and mapping data in a “raw” form into a more refined form. Data munging is a very important step in the machine learning process that often takes up to 80% of the time and cost involved