5.2. Loading the Base Star Schema

The process of loading the base schema requires that fact tables and dimension tables be loaded. Fact tables bear foreign keys to the dimension tables and are therefore dependent entities. This would suggest that dimension tables be loaded first.

When there is a single source application, it is possible to take one extract or query and process all data in a single load program. The program would scrutinize each record, insert or update each dimension as required, and construct a fact record. This is sometimes favored because of the assumption that it permits a single pass through the incoming data set.

However, several factors may argue for multiple processes. A single process will analyze dimensional information redundantly, once for each fact rather than once for each distinct dimension record. Many dimensions may involve multiple data sources, perhaps best served by creating a staging area for the source data.

The most important reason to develop a separate load process for each table is maintainability. A single load that updates multiple tables can be very difficult to maintain. And as the data warehouse grows in size, maintenance becomes a more important issue. A change to the rule by which a dimension value is decoded would require development and QA on a process that loads that table, along with several other dimension tables and a fact table. As the scope of the warehouse increases, some of these tables may also be referenced by additional ...

Get Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.