Chapter 20. Joining Data with the Lookup Transform

Are you looking for a way to join data from a new source into your Data Flow pipeline? The Lookup Transform in SQL Server Integration Services (SSIS) allows you to perform the equivalent of an inner and outer hash join. The only difference is that the operations occur outside the realm of the database engine.

This transform is used in many different situations but would typically be found in an ETL process that populates a data warehouse. For example, you may want to populate a table by joining data from two separate source systems on different database platforms. The component can join only two data sets at a time, so in order to join three or more data sets you would need to string multiple Lookup Transforms together.

The Lookup Transform is a synchronous transform; therefore, it does not block the pipeline's flow of data. As new data enters the transform, rows that have been joined are leaving through one of the possible outputs. The caveat to this is that in certain caching modes the component will initially block the package's execution for a period of time while it charges its internal caches.

Sometimes rows will not join successfully. For example, you may have a product that has no purchase history and its identifier in the product table would have no matches in the sales table. SSIS supports this by having multiple outputs on the Lookup Transform; in the simplest (default/legacy) configuration you would have one output for ...

Get Knight's 24-Hour Trainer: Microsoft® SQL Server® 2008 Integration Services now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.