Cleansing by doing a fuzzy search

With a fuzzy search, we don't look for exact matches but for similar values. PDI allows you to perform fuzzy searches with the special step Fuzzy match. With this step, you can find approximate matches to a string using matching algorithms.

In order to see how to use this step, let's go back to our example. Suppose that we have a list of valid states along with their codes, as follows:

State;AbbreviationAlabama;ALAlaska;AKArizona;AZ...West Virginia;WVWisconsin;WIWyoming;WY

On the other hand, we have a stream of data and, among the fields, one field representing states. The problem is that not all values are correct.

The following could be a list of incoming values:

CalifroniaCaloradoWashingtonMasachusetts ...

Get Learning Pentaho Data Integration 8 CE - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.