O'Reilly logo

Professional SQL Server™ 2005 Integration Services by Mike Murphy, Haidong Ji, Jason Gerard, Erik Veerman, Andy Leonard, Kathi Kellenberger, Douglas Hinson, Darren Green, Allan Mitchell, Brian Knight

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

6.9. Fuzzy Lookup

If you've done some work in the world of extract, transfer, and load processes (ETL), you've run into the proverbial crossroads of handling bad data. The test data is staged, but all attempts to retrieve a foreign key from a dimension table result in no matches for a number of rows. This is the crossroads of bad data. At this point, there are a finite set of options. You could create a set of Astro-Physics-based lookup functions using SQL Sound-Ex, full-text searching, or distance-based word calculation formulas. This strategy is time-consuming to create and test, complicated to implement, and dependent on language lexicon, and it isn't always consistent or reusable (not to mention that everyone from now on will be scared to alter the code for fear of breaking it). You could just give up and divert the row for manual processing by subject matter experts (that's a way to make some new friends). You could just add the new data to the lookup tables and retrieve the new keys. If you just add the data, the foreign key retrieval issue gets solved, but you could be adding an entry into the dimension table that will skew data-mining results downstream. This is what I like to call a lazy-add. This is a descriptive, not a technical, term. A lazy-add would import a misspelled job title like "prasedent" into the dimension table when there is already an entry of "president." It was added, but it was lazy.

The Fuzzy Lookup and Fuzzy Grouping transformations add one more road ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required