Treating invalid data by splitting and merging streams

It's a fact that data from the real world is not perfect; it has errors. We already saw that errors in data can cause our transformations to crash. We also learned how to detect and report errors while avoiding undesirable situations. The main problem is that in doing so, we discard data that may be important. Sometimes the errors are not so severe; in fact, there is a possibility that we can fix them so that we don't loose data. Let's see some examples:

  • You have a field defined as a string, and that field represents the date of birth of a person. As values, you have, besides valid dates, other strings for example N/A, -, ???, and so on. Any attempt to run a calculation with these values would ...

Get Pentaho Data Integration Beginner's Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.