Chapter 7. Designing and Developing the ETL System

Measure twice; cut once.

Some people like to plan, specify, and document systems; most don't. We've observed that Extract, Transformation, and Load (ETL) system development draws folks in the latter category. We've found so few people who write adequate design specifications for their ETL systems that we've practically stopped asking to see our clients' planning documents. Either they're unaware of the impending complexity, or they don't have the planning tools.

In this chapter, we begin by discussing the steps you need to take before you start development of your ETL system: Round up your requirements and write an ETL plan. Next, we briefly introduce SQL Server Integration Services (SSIS) and present some of the basic concepts and vocabulary of that product.

Most of this chapter is structured around the 34 subsystems of a well designed ETL system. We describe each of the 34 subsystems and recommend alternative approaches for implementing them within SSIS. Throughout, we will refer to the Adventure Works Cycles case study.

As Figure 7-1 illustrates and common sense dictates, the ETL portion of the project is part of the data track. Remember, however, that these boxes aren't to scale. The ETL effort is the most time-consuming step in the data track and often in the entire project.

In this chapter, you learn:

  • How to plan for the ETL system, including the components of a solid ETL design specification, and how to create this document.

  • What ...

Get The Microsoft® Data Warehouse Toolkit: With SQL Server 2008 R2 and the Microsoft® Business Intelligence Toolset, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.