8.1. Data Quality

The quality of data simply means how dependable the data is. Obviously, you want your organization's data to be as dependable as possible. This section covers topics of normalization, defining columns, stored procedures, and triggers. All of these areas have a great impact on dependability of data.

8.1.1. Normalization

This is a term you've probably heard before, but may not fully understand. Exacerbating this misunderstanding are textbooks and references that tend toward heavy use of jargon and mathematical symbology. Therefore, clear explanations of the issues that normalization addresses and the techniques used are necessary. First, the primary problems addressed concern data quality. Data inconsistencies and anomalies lead to poor data quality. In order to eliminate or at least minimize that situation, a proven process of reorganization is needed. Edgar F. Codd, the pioneer of relational database methodology, first described this process, which he defined as normalization. He described it as a method in which non-simple domains are decomposed to the point that the elements are atomic values (no further decomposition is possible).

The best way to understand these issues is with an example. A data set is presented. Next, the anomalies that can occur on that set are described. Finally, the data set will be reorganized using proven normalization rules.

8.1.1.1. Example Data Set

Suppose the following dataset is used by the FCC to keep track of amateur radio ...

Get Professional SQL Server® 2005 Performance Tuning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.