Chapter 7. Joining Data

In the simplest ETL scenarios, you use an SSIS pipeline to extract data from a single source table and populate the corresponding destination table. In practice, though, you usually won't see such trivial scenarios: The more common ETL scenarios will require you to access two or more Data Sources simultaneously and merge their results together into a single destination structure. For instance, you may have a normalized source system that uses three or more tables to represent the product catalog, whereas the destination represents the same information using a single de-normalized table (perhaps as part of a data warehouse schema). In this case you would need to join the multiple source tables together in order to present a unified structure to the destination table.

In the relational world, such requirements are easily met by employing a relational join operation. However, in the ETL world you may not be so fortunate that the tables to be joined live in the same physical database, same brand of database, same server, or in the worst cases even same physical location (all of which typically render the relational join method useless). In fact, one common scenario is where data from a legacy source system is staged in flat text files, which then need to be joined to dimensional data residing in a SQL Server data warehouse.

So the ETL system needs to be able to join data in a similar way to relational systems, but should not be constrained to having the source ...

Get Professional SQL Server® 2008 Integration Services now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.