Validating data with PDI

Validating data is about ensuring that incoming data contains expected values. There are several kinds of constraints that we may need to impose on our data. The following are just some examples:

  • A field must contain only digits
  • A date field must be formatted as MM-dd-yyyyy
  • A field must be either YES or NO
  • The value of a field must exist in a reference table

If a field doesn't respect theses rules or constraints, we have to proceed somehow. Some options are as follows:

  • Reporting the error to the log
  • Inserting the inconsistency into a dedicated table
  • Writing the line with the error in a file for further revision
  • Discarding just the row of data containing the error

The following section shows a simple example that ...

Get Learning Pentaho Data Integration 8 CE - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.