Treating invalid data by splitting and merging streams

When you are transforming data, it is not uncommon that you detect inaccuracies or errors. Sometimes the issues you find may not be severe enough to discard the rows. Maybe you can somehow guess what data was supposed to be there instead of the current values, or it can happen that you have default values for the invalid values. Let's see some examples:

  • You have a field defined as a string, and this field represents the date of birth of a person. As values, you have, besides valid dates, other strings, for example N/A, -, ???, and so on. Any attempt to run a calculation with these values would lead to an error.
  • You have two dates representing the start date and end date of the execution ...

Get Learning Pentaho Data Integration 8 CE - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.