13.4. DATA QUALITY TOOLS

Based on our discussions in this chapter so far, you are at a point where you are convinced about the seriousness of data quality in the data warehouse. Companies have begun to recognize dirty data as one of the most challenging problems in a data warehouse.

You would, therefore, imagine that companies must be investing heavily in data cleanup operations. But according to experts, data cleansing is still not a very high priority for companies. This attitude is changing as useful data quality tools arrive on the market. You may choose to apply these tools to the source systems, in the staging area before the load images are created, or to the load images themselves.

13.4.1. Categories of Data Cleansing Tools

Generally, data cleansing tools assist the project team in two ways. Data error discovery tools work on the source data to identify inaccuracies and inconsistencies. Data correction tools help fix the corrupt data. These correction tools use a series of algorithms to parse, transform, match, consolidate, and correct the data.

Although data error discovery and data correction are two distinct parts of the data cleansing process, most of the tools on the market do a bit of both. The tools have features and functions that identify and discover errors. The same tools can also perform the cleaning up and correction of polluted data. In the following sections, we will examine the features of the two aspects of data cleansing as found in the available tools. ...

Get DATA WAREHOUSING FUNDAMENTALS: A Comprehensive Guide for IT Professionals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.