Chapter 5. ETL Part 1: Incorporating Aggregates

Before aggregates can be used, they must be loaded with summary data. Aggregate processing is therefore a major component of the process that loads the data warehouse, often referred to as the ETL process or the load process.

This is the first of two chapters that explores the loading of aggregates. It describes the process of loading the base schema of the data warehouse and what considerations arise when aggregates are added to the mix. The next chapter explores in detail the specific processes that load aggregate tables.

This chapter begins with an overview of the ETL process for the base schema. A critical part of any data warehouse implementation, the load process may involve several tools and must solve some difficult problems.

Next, this chapter describes the processes involved in loading the base schema tables. While tool-specific implementations will vary widely, every load must meet certain basic requirements. These requirements will serve as the basis for high-level process flows, illustrating the steps required to load base dimension and fact tables.

Once the base schema load has been described, this chapter looks at how aggregates are incorporated into the process. The presence of aggregates requires that the load process manage the availability of aggregates, taking them off-line during processes that update base tables. The aggregate processing may take two forms: a complete rebuild or an incremental refresh.

Using ...

Get Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.