6.7. Summary

Unless the data warehouse is being updated in real time, aggregate tables are processed most efficiently when their source is the base schema. This provides a simple mechanism for changed data identification, eliminates redundant processing, and helps ensure conformance.

The incremental load process for an aggregate dimension table is quite similar to that of a base dimension table. Because it is sourced from the base schema, the process is greatly simplified. Complications arise only when the aggregate contains a type 1 attribute that is not dependent on another of its attributes. In this situation an incremental load will require that changes to the base schema be audited. Alternatively, a drop-and-rebuild approach may be desirable.

The incremental processing of aggregate facts is quite similar to the processing of base facts. Complications arise when the load process is executed multiple times for a given key value within the time dimension. If this occurs, the process must be prepared to handle previously encountered key combinations through an update. This can be avoided by simply truncating and reloading the current period.

While they appear to be a simplified version of an aggregate star, pre-joined aggregates are loaded quite differently. For an incremental load, multiple passes are required for each dimension represented in the aggregate, applying any type 1 changes that occur. A final pass inserts new records and type 2 changes. A drop-and-rebuild can be ...

Get Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.