Deduplicating non-exact matches

In addition, suppose that we have the same data as before and want to create a list of the states that appear in our dataset. Among the values, we have  HawaiiHawai, and Howaii. We don't want the three values on our final list. We only want a single state: Hawaii. If we try to deduplicate the data with the Unique rows step, we will still have three values. The only solution is trying to fix the values with a fuzzy search algorithm, and only after that doing the deduplication. This doesn't differ much from the previous solution:

  1. Open the transformation you just created and save it under a different name.
  2. Run a preview of the Fuzzy match step. In the preview window, click on the title of the match column to ...

Get Learning Pentaho Data Integration 8 CE - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.