Ensuring Consistency within a Data Warehouse

Inconsistent data can quickly void any confidence that end users have garnered when working with any data system. Because one of the primary reasons for designing a star schema is to provide faster query response time when performing OLAP queries, it is pivotal that the results are accurate and consistent. To ensure this level of accuracy and consistency, data warehouse developers typically institute two basic design rules during the load process.

The first rule is to ensure that you are properly handling null or invalid values. This is usually accomplished by replacing these values with values that are more meaningful to an end user. The decision as to what these values are should not be determined by a single person or group. This is an effort that should be decided upon by the organization as a whole because the data warehouse will likely be used by a broad audience. This results in a consistent interpretation across the entire organization. A common practice is to insert rows into dimensions that satisfy as many inconsistent or invalid data scenarios. The values are usually replaced with values such as Unknown, NA, Internal, and so on.

The second rule relates to the fact table. After data has been inserted into the fact table, it should not be updated or deleted. This could lead to varying results from day to day. If this is a requirement, you should consider creating a separate table to maintain these transient transactions. You ...

Get Microsoft SQL Server 2012 Bible now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.