9.1. Aggregates and the Archive Strategy

Historic data is often removed from a star schema in order to maintain tables of a manageable size. Concurrent with this archival process, corresponding data must be removed from aggregates. This data may be archived itself, or simply discarded. In addition, it may be moved to a separate aggregate table, to serve as an on-line summarization of archived detail.

9.1.1. The Data Warehouse Archive Strategy

The limits of data warehouse size are continuously pushed upward by ongoing improvement in RDBMS technology and a downward trend in the cost of storage hardware. As the upper limits increase, so too does the size at which a business will find the optimal tradeoff between availability of analytic data and the necessary investment to support it. Today, for some data warehouse subject areas, the amount of data generated may not be deemed significant enough to require the purging of older, historic data.

But as the practical limits of database size increase, businesses find new uses for the increased capacity of their data warehouses. High-volume transaction models, once managed at a summary level, can now be stored at the detail level. Streams of data that might once have been considered too large, such as the stream of clicks generated by visits to a website, can be maintained on-line. Thus, even with the increase in the capacity of a well-performing, cost-effective data warehouse, it becomes necessary to take steps to remove older data.

Get Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.