Cleansing, Validating, and Fixing Data

You already have a set of tools to manipulate data. This chapter offers different ways of applying the learned concepts to cleanse data and also deal with invalid data, either by discarding it or fixing it.

We will cover the following topics in this chapter:

  • Standardizing information and improving the quality of data
  • Introducing some steps useful for data cleansing
  • Dealing with non-exact matches
  • Validating data
  • Treating invalid data by splitting and merging streams

Get Learning Pentaho Data Integration 8 CE - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.