13.5. DATA QUALITY INITIATIVE

In spite of the enormous importance of data quality, it seems as though many companies still ask the question whether to pay special attention to it and cleanse the data or not. In many instances, the data for the missing values of attributes cannot be recreated. In quite a number of cases, the data values are so convoluted that the data cannot really be cleansed. A few other questions arise. Should the data be cleansed? If so, how much of it can really be cleansed? Which parts of the data deserve higher priority for applying data cleansing techniques? The indifference and the resistance to data cleansing emerge from a few valid factors:

  • Data cleansing is tedious and time-consuming. The cleansing activity demands a combination of the usage of vendor tools, writing of in-house code, and arduous manual tasks of verification and examination. Many companies are unable to sustain the effort. This is not the kind of work many IT professionals enjoy.

  • The metadata on many source systems may be missing or nonexistent. It will be difficult or even impossible to probe into dirty data without the documentation.

  • The users who are asked to ensure data quality have many other business responsibilities. Data quality probably receives the least attention.

  • Sometimes, the data cleansing activity appears to be so gigantic and overwhelming that companies are terrified of launching a data cleansing initiative.

Once your enterprise decides to institute a data cleansing initiative, ...

Get DATA WAREHOUSING FUNDAMENTALS: A Comprehensive Guide for IT Professionals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.