Chapter 10Advanced Data Cleansing in SSIS
- Using the Derived Column Transformation for advanced data cleansing
- Applying the Fuzzy Lookup and Fuzzy Grouping transformations and understanding how they work
- Introducing Data Quality Services
- Introducing Master Data Services
You can find the wrox.com code downloads for this chapter at http://www.wrox.com/go/prossis2014 on the Download Code tab.
In this chapter, you will learn the ins and outs of data cleansing in SSIS, from the basics to the advanced. In a broad sense, one of SSIS’s main purposes is to cleanse data — that is, transform data from a source to a destination and perform operations on it along the way. In that sense, someone could correctly say that every transformation in SSIS is about data cleansing. For example, consider the following transformations:
- The Data Conversion adjusts data types.
- The Sort removes duplicate data.
- The Merge Join correlates data from two sources.
- The Derived Column applies expression logic to data.
- The Data Mining predicts values and exceptions.
- The Script applies .NET logic to data.
- The Term Extraction and Term Lookup perform text mining.
In a stricter sense, data cleansing is about identifying incomplete, incorrect, or irrelevant data and then updating, modifying, or removing the “dirty” data. From this perspective, SSIS has four primary data cleansing transformations, which are reviewed in this chapter: