Chapter 3. Data Extraction

A natural first discussion point for ETL is the extraction, the E in ETL. This chapter applies the concepts of the data extraction with using SSIS. As discussed in Chapter 1, ETL applies to a broad spectrum of applications beyond just data warehousing and ETL loading. Therefore, the discussion of this topic will include both generalized extraction concepts and data warehouse–specific concepts.

Data extraction is the process of moving data off of a source system, potentially to a staging environment or into the transformation phase of the ETL. Figure 3-1 shows the extraction process separated out on the left. An extraction process may pull data from a variety of sources, including files or database systems, as this figure highlights.

A few common objectives of data extraction include the following:

  • Consistency in how data is extracted across source systems

  • Performance of the extraction

  • Minimal impact on the source to avoid contention with critical source processes

  • Flexibility to handle source system changes

  • The ability to target only new or changed records

This chapter is structured into the following three sections:

  • SSIS data flow source adapters—Beyond just a review of the data flow sources, this chapter looks at performance and configuration options.

  • Incremental extraction—Part of applying data extraction in SSIS involves seeing how to design SSIS packages that perform incremental extractions—extracting only changed records from a source.

  • Data lineage—Many ...

Get Expert SQL Server™ 2005 Integration Services now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.