You are previewing SQL Server 2012 Data Integration Recipes: Solutions for Integration Services and Other ETL Tools.
O'Reilly logo
SQL Server 2012 Data Integration Recipes: Solutions for Integration Services and Other ETL Tools

Book Description

SQL Server 2012 Data Integration Recipes provides focused and practical solutions to real world problems of data integration. Need to import data into SQL Server from an outside source? Need to export data and send it to another system? SQL Server 2012 Data Integration Recipes has your back. You'll find solutions for importing from Microsoft Office data stores such as Excel and Access, from text files such as CSV files, from XML, from other database brands such as Oracle and MySQL, and even from other SQL Server databases. You'll learn techniques for managing metadata, transforming data to meet the needs of the target system, handling exceptions and errors, and much more.

What DBA or developer isn't faced with the need to move data back and forth? Author Adam Aspin brings 10 years of extensive ETL experience involving SQL Server, and especially satellite products such as Data Transformation Services and SQL Server Integration Services. Extensive coverage is given to Integration Services, Microsoft's flagship tool for data integration in SQL Server environments. Coverage is also given to the broader range of tools such as OPENDATASOURCE, linked servers, OPENROWSET, Migration Assistant for Access, BCP Import, and BULK INSERT just to name a few. If you're looking for a resource to cover data integration and ETL across the gamut of Microsoft's SQL Server toolset, SQL Server 2012 Data Integration Recipes is the one book that will meet your needs.

  • Provides practical and proven solutions towards creating resilient ETL environments

  • Clearly answers the tough questions which professionals ask

  • Goes beyond the tools to a thorough discussion of the underlying techniques

  • Covers the gamut of data integration, beyond just SSIS

  • Includes example databases and files to allow readers to test the recipes

What you'll learn

  • Import and export to and from CSV files, XML files, and other text-based sources.

  • Move data between SQL databases, including SQL Server and others such as Oracle Database and MySQL.

  • Discover and manage metadata held in various database systems.

  • Remove duplicates and consolidate from multiple sources.

  • Transform data to meet the needs of target systems.

  • Profile source data as part of the discovery process.

  • Log and manage errors and exceptions during an ETL process.

  • Improve efficiency by detecting and processing only changed data.

Who this book is for

SQL Server 2012 Data Integration Recipes is written for developers wishing to find fast and reliable solutions for importing and exporting to and from SQL Server. The book appeals to DBAs as well, who are often tasked with implementing ETL processes. Developers and DBAs moving to SQL Server from other platforms will find the succinct, example-based approach ideal for quickly applying their general ETL knowledge to the specific tools provided as part of a SQL Server environment.

Table of Contents

  1. Title Page
  2. Dedication
  3. Contents at a Glance
  4. Contents
  5. About the Author
  6. About the Technical Reviewers
  7. Acknowledgments
  8. Introduction
  9. Chapter 1: Sourcing Data from MS Office Applications
    1. 1-1. Ensuring Connectivity to Access and Excel
    2. 1-2. Importing Data from Excel
    3. 1-3. Modifying Excel Data During a Load
    4. 1-4. Specifying the Excel Data to Load During an Ad-Hoc Import
    5. 1-5. Planning for Future Use of a Linked Server
    6. 1-6. Reading Data Automatically from an Excel Worksheet
    7. 1-7. Loading Excel Data as Part of a Structured ETL Process
    8. 1-8. Importing Excel 2007/2010 Data Using SSIS 2005
    9. 1-9. Handling Source Data Issues When Importing Excel Worksheets Using SSIS
    10. 1-10. Pushing Access Data into SQL Server
    11. 1-11. Importing Multiple Access Tables
    12. 1-12. Ad Hoc Imports of Access Data
    13. 1-13. Obtaining Access Data Without Regular Imports
    14. 1-14. Importing Access Data as Part of a Regular ETL Process
    15. 1-15. Convert a Complex Access Database to SQL Server
    16. 1-16. Resolving Complex Data Migration Problems During an Access to SQL Server Upgrade
    17. Summary
  10. Chapter 2: Flat File Data Sources
    1. 2-1. Importing Data From a Text File
    2. 2-2. Importing a Delimited Text File
    3. 2-3. Automatically Determining Data Types
    4. 2-4. Importing Fixed-Width Text Files
    5. 2-5. Importing Text Files Using T-SQL
    6. 2-6. Mapping a Source File
    7. 2-7. Importing Data Using T-SQL in Anticipation of Using a Linked Server
    8. 2-8. Accessing a Text File As a Linked Server Table
    9. 2-9. Importing Flat Files from the Command Line
    10. 2-10. Importing Large Text Files Using T-SQL and Putting the Emphasis on Speed
    11. 2-11. Creating a Format File for Complex Flat File Loads Using Bulk Insert or BCP
    12. 2-12. Performing a BULK INSERT with a Format File
    13. 2-13. Loading a Text File Fast Using T-SQL
    14. 2-14. Executing BULK INSERT from SSIS
    15. 2-15. Handling Complex Flat File Formats with a Row Prefix in SSIS
    16. 2-16. Pre-Parsing and Staging File Subsets in SSIS
    17. 2-17. Handling Irregular Numbers of Columns in the Source File Using SQL Server 2012
    18. 2-18. Handling Embedded Qualifiers in SQL Server 2012
    19. 2-19. Handling Irregular Numbers of Columns in the Source File in SQL Server 2005 and 2008
    20. 2-20. Determining the Number of Columns in a Source File
    21. 2-21. Preparing CSV Files for Import
    22. Summary
  11. Chapter 3: XML Data Sources
    1. 3-1. Loading XML Files for Storage in SQL Server
    2. 3-2. Loading XML Data into Rows and Columns
    3. 3-3. Shredding an XML File into an SQL Server Table
    4. 3-4. Importing XML Data as Part of an ETL Process
    5. 3-6. Importing Multiple XML Files into SQL Server
    6. 3-7. Flattening an XML File to Prepare It for Import
    7. 3-8. Importing XML Data from Very Large Files, Putting a Priority on Speed
    8. 3-9. Loading Multiple Tables at Once from a Single XML Source File
    9. 3-10. Loading and Shredding Relational Tables from an XML Source File
    10. 3-11. Overcoming Bulk Loading Challenges for XML Files
    11. 3-12. Running SQLXML Bulk Loader As Part of a Regular ETL Process
    12. 3-13. Validating an XML Document Against a Schema File As Part of a Scripted Solution
    13. 3-14. Validating an XML Document Against a Schema File in SSIS
    14. Summary
  12. Chapter 4: SQL Databases
    1. Preamble: Installing and Configuring OLEDB and ODBC Providers
    2. 4-1. Configuring Your Server to Connect to Oracle
    3. 4-2. Importing Data from Oracle As a Regular Process
    4. 4-3. Accelerating Oracle Data Import
    5. 4-4. Importing Oracle Data on an “Ad Hoc” Basis
    6. 4-5. Migrating Multiple Oracle Tables and Views
    7. 4-6. Loading DB2 Data on a Regular Basis
    8. 4-7. Importing DB2 Data Without SSIS
    9. 4-8. Sourcing Data from MySQL on a Regular Basis
    10. 4-9. Importing MySQL Data on an “Ad Hoc” Basis
    11. 4-10. Importing MySQL Data Using SSIS 2005 and 2008
    12. 4-11. Migrating Complete Tables from MySQL
    13. 4-12. Loading Data from Sybase Adaptive Server Enterprise (ASE)
    14. 4-14. Importing Sybase ASE Data on an “Ad Hoc” Basis
    15. 4-15. Importing Sybase ASE Data on a Regular Basis
    16. 4-16. Loading Teradata Data
    17. 4-17. Sourcing Data from PostgreSQL
    18. Summary
  13. Chapter 5: SQL Server Sources
    1. 5-1. Loading Ad Hoc Data from Other SQL Server Instances
    2. 5-2. Reading Data from Another SQL Server Instance over a Permanent Connection
    3. 5-3. Loading Large Data Sets Using T-SQL
    4. 5-4. Load Data Exported from SQL Server from the Command Line
    5. 5-5. Loading SQL Server Data from a Native SQL Server File
    6. 5-6. Transferring Data Between SQL Server Databases Regularly
    7. 5-7. Porting a Tiny Amount of Data Between SQL Server Databases
    8. 5-8. Copying and Pasting Between Tables
    9. 5-9. Loading Data as Fast as Possible in an ETL Process
    10. 5-10. Importing and Exporting All the Tables in a Database
    11. 5-11. Loading Data from Different Versions of SQL Server
    12. 5-12. Copying Entire Databases
    13. 5-13. Transferring a Complex Subset of Data Between Databases
    14. 5-14. Loading Data into SQL Server Azure Interactively
    15. 5-15. Loading Data into SQL Server Azure as Part of a Regular ETL Process
    16. 5-16. Loading Data into SQL Server Azure from the Command Line
    17. 5-17. Loading Ad Hoc Data into SQL Server Azure
    18. 5-18. Transferring Tables Between Databases
    19. Summary
  14. Chapter 6: Miscellaneous Data Sources
    1. 6-1. Importing Data from SQL Server Analysis Services
    2. 6-2. Importing Data from an Analysis Services Cube on a Regular Basis
    3. 6-3. Querying an OLAP Source on an Ad Hoc Basis
    4. 6-4. Loading Images and Documents into an SQL Server Table
    5. 6-5. Importing Multiple Files into an SQL Server Table
    6. 6-6. Importing Files into SQL Server on a Regular Basis
    7. 6-7. Importing Files with Their Attributes into SQL Server
    8. 6-8. Loading Visual FoxPro Files
    9. 6-9. Importing Data from dBase
    10. 6-10. Loading Data from Web Services
    11. 6-11. Importing Windows Management Instrumentation Data
    12. 6-12. Importing Data over ODBC
    13. 6-13. Linking to 32-bit data sources from a 64-bit SQL Server
    14. Summary
  15. Chapter 7: Exporting Data from SQL Server
    1. 7-1. Exporting Data Occasionally in Various Formats
    2. 7-2. Exporting Data As a Delimited Text File
    3. 7-3. Exporting Data to a Fixed-Width Text File
    4. 7-4. Exporting Text Files from the Command Line
    5. 7-5. Exporting Data Using BCP Without Recourse to the Command Line
    6. 7-6. Exporting Data As Text Files from T-SQL
    7. 7-7. Exporting Data to Another SQL Server Database
    8. 7-8. Exporting Text Files Regularly from T-SQL
    9. 7-9. Exporting and Compressing Multiple Tables
    10. 7-10. Exporting a Tiny Dataset As XML
    11. 7-11. Exporting a Larger Dataset As XML
    12. 7-12. Exporting Large Datasets As XML
    13. 7-13. Creating an XML Schema
    14. 7-14. Shaping XML Export Data
    15. 7-15. Exporting XML Data on a Regular Basis
    16. 7-16. Routinely Exporting Small XML Datasets
    17. 7-17. Exporting Data to Excel Using T-SQL
    18. 7-18. Exporting Data to Access Using T-SQL
    19. 7-19. Exporting Data Securely to Excel from T-SQL
    20. 7-20. Exporting Data Securely to Access from T-SQL
    21. 7-21. Exporting Data to Excel Using SSIS
    22. 7-22. Exporting Data to Access Using SSIS
    23. 7-23. Pulling Data from Office Applications
    24. 7-24. Exporting Files Stored in SQL Server Using T-SQL
    25. 7-25. Exporting Files Stored in SQL Server on a Regular Basis
    26. 7-26. Exporting Data from SSAS Using T-SQL on an Occasional Basis
    27. 7-27. Exporting Data from SSAS Using T-SQL on a More Regular Basis
    28. 7-28. Exporting an SSAS Dimension Using SSIS
    29. 7-29. Exporting the Result of an MDX Query in SSIS
    30. 7-30. Exporting Data to Other Relational Databases Using T-SQL
    31. 7-31. Exporting Data to Other Relational Databases Using SSIS
    32. 7-32. Exporting Data from SQL Server Azure
    33. Summary
  16. Chapter 8: Metadata
    1. 8-1. Listing Available Tables from a Linked Server
    2. 8-2. Listing the Columns Available When Using a Linked Server
    3. 8-3. Discovering Flat File Metadata
    4. 8-4. Returning Simple SQL Server Metadata
    5. 8-5. Gathering Tailored SQL Server Metadata
    6. 8-6. Analyzing SQL Server Table Metadata
    7. 8-7. Analyzing SQL Server Column Metadata
    8. 8-8. Displaying Microsoft Access Metadata
    9. 8-9. Reading MySQL Metadata
    10. 8-10. Displaying Oracle Metadata from SSIS
    11. 8-11. Querying Oracle Metadata Using T-SQL
    12. 8-12. Understanding the Oracle Data Dictionary
    13. 8-13. Displaying DB2 Metadata
    14. 8-14. Obtaining SQL Server Metadata Using .NET
    15. Summary
  17. Chapter 9: Data Transformation
    1. 9-1. Converting Data Types
    2. 9-2. Removing Duplicates from Data
    3. 9-3. Deduplicating Large Recordsets
    4. 9-4. Deduplicating Data in an ETL Data Flow
    5. 9-5. Subsetting Column Data Using T-SQL
    6. 9-6. Generating Fixed-Length Column Data Subsets Using SSIS
    7. 9-7. Generating Variable-Length Column Data Subsets
    8. 9-8. Concatenating Source Data Using T-SQL
    9. 9-9. Concatenating Data Using SSIS
    10. 9-10. Duplicating Columns
    11. 9-11. Converting Strings to Uppercase or Lowercase
    12. 9-12. Converting Strings to Title Case
    13. 9-13. PIVOTing Data in T-SQL
    14. 9-14. Pivoting Data in SSIS with SQL Server 2012
    15. 9-15. Pivoting Data in SSIS with SQL Server 2005 and 2008
    16. 9-16. Consolidating Multiple Identical Data Sources in T-SQL
    17. 9-17. Consolidating Multiple Identical Data Sources in SSIS
    18. 9-18. Normalizing Data Inside a Single Table Using T-SQL
    19. 9-19. Normalizing Data Inside a Single Table Using SSIS
    20. 9-20. Normalizing Data into Multiple Relational Tables Using T-SQL
    21. 9-21. Normalizing Data into Multiple Relational Tables Using SSIS
    22. 9-22. Denormalizing Data by Referencing Lookup Tables in T-SQL
    23. 9-23. Denormalizing Data by Referencing Lookup Tables in SSIS
    24. 9-24. Processing Type 1 Slowly Changing Dimensions (SCDs) Using T-SQL
    25. 9-25. Handling Type 2 Slowly Changing Dimensions in T-SQL
    26. 9-26. Handling Type 2 Slowly Changing Dimensions with SSIS
    27. 9-27. Handling Type 3 Slowly Changing Dimensions Using T-SQL
    28. 9-28. Handling Type 4 Slowly Changing Dimensions Using T-SQL
    29. 9-29. Handling Type 4 Slowly Changing Dimensions with SSIS
    30. 9-30. Cleansing Data As Part of an ETL Process
    31. Summary
  18. Chapter 10: Data Profiling
    1. 10-1. Profiling Data Attributes
    2. 10-2. Profiling Domain and Value Distribution
    3. 10-3. Profiling External Data
    4. 10-4. Profiling External Data Faster
    5. 10-5. Running and Logging a Complete DataProfile
    6. 10-6. Profiling Data Quickly in SSIS
    7. 10-7. Creating Custom Data Profiles with SSIS
    8. 10-8. Using the SSIS Data Profiling Task on non-SQL Server Data Sources
    9. 10-9. Reading Profile Data
    10. 10.10. Storing SSIS Profile Data in a Database
    11. 10-11. Tailoring Specific Source Data Profiles in SSIS
    12. 10-12. Domain Analysis in SSIS
    13. 10-13. Performing Multiple Domain Analyses
    14. 10-14. Pattern Profiling in a Data Flow
    15. 10-15. Pattern Profiling Using T-SQL
    16. 10-16. Profiling Data Types
    17. 10-17. Controlling Data Flow via Profile Metadata
    18. Summary
  19. Chapter 11: Delta Data Management
    1. Preamble: Why Bother with Delta Data?
    2. Delta Data Approaches
    3. 11-1. Loading Delta Data as Part of a Structured ETL Process
    4. 11-2. Loading Data Changes Using a Linked Server
    5. 11-3. Loading Data Changes From a Small Source Table as Part of a Structured ETL Process
    6. 11-4. Detecting and Loading Delta Data Only
    7. 11-5. Performing Delta Data Upserts with Other SQL Databases
    8. 11-6. Handling Data Changes Without Writing to the Source Server
    9. 11-7. Detecting Data Changes with Limited Source Database Access
    10. 11-8. Detecting and Loading Delta Data Using T-SQL and a Linked Server When MERGE Is Not Practical
    11. 11-9. Detecting, Logging, and Loading Delta Data
    12. 11-10. Detecting Differences in Rowcounts, Metadata, and Column Data
    13. Summary
  20. Chapter 12: Change Tracking and Change Data Capture
    1. 12-1. Detecting Source Table Changes with Little Overhead and No Custom Framework
    2. 12-2. Pulling Changes into a Destination Table with Change Tracking
    3. 12-3. Using Change Tracking as Part of a Structured ETL Process
    4. 12-4. Detecting Changes to Source Data Using the SQL Server Transaction Log
    5. 12-5. Applying Change Data Capture with SSIS
    6. 12-6. Using Change Data Capture with Oracle Source Data
    7. Summary
  21. Chapter 13: Organising And Optimizing Data Loads
    1. 13-1. Loading Multiple Files
    2. 13-2. Selecting Multiple Text Files to Import
    3. 13-3. Loading Multiple Files Using Complex Selection Criteria
    4. 13-4. Ordering and Filtering File Loads
    5. 13-5. Loading Multiple Flat Files in Parallel
    6. 13-6. Loading Source Files with Load Balancing
    7. 13-7. Loading Data to Parallel Destinations
    8. 13-8. Using a Single Data File As a Multiple Data Source for Parallel Destination Loads
    9. 13-9. Reading and Writing Data from a Database Source in Parallel
    10. 13-10. Inserting Records in Parallel and in Bulk
    11. 13-11. Creating Self-Optimizing Parallel Bulk Inserts
    12. 13-12. Loading Files in Controlled Batches
    13. 13-13. Executing SQL Statements and Procedures in Parallel Using SSIS
    14. 13-14. Executing SQL Statements and Procedures in Parallel Without SSIS
    15. 13-15. Executing SQL Statements and Procedures in Parallel Using SQL Server Agent
    16. Summary
  22. Chapter 14: ETL Process Acceleration
    1. 14-1. Accelerating SSIS Lookups
    2. 14-2. Disabling and Rebuilding Nonclustered Indexes in a Destination Table
    3. 14-3. Persisting Destination Database Index Metadata
    4. 14-4. Scripting and Executing DROP Statements for Destination Database Indexes
    5. 14-5. Scripting and Executing CREATE Statements for Destination Database Indexes
    6. 14-6. Storing Metadata, and Then Scripting and Executing DROP and CREATE Statements for Destination Database XML Indexes
    7. 14-7. Finding Missing Indexes
    8. 14-8. Managing Check Constraints
    9. 14-9. Managing Foreign Key Constraints
    10. 14-10. Optimizing Bulk Loads
    11. Summary
  23. Chapter 15: Logging and Auditing
    1. 15-1. Logging Events from T-SQL
    2. 15-2. Logging Data from SSIS
    3. 15-3. Customizing SSIS Logging
    4. 15-4. Saving and Applying Complex SSIS Logging Details
    5. 15-5. Extending SSIS Logging to an SQL Server Destination
    6. 15-6. Logging Information from an SSIS Script Task
    7. 15-7. Logging from T-SQL to the SSIS Log Table
    8. 15-8. Handling Errors in T-SQL
    9. 15-9. Handling Errors in SSIS
    10. 15-10. Creating a Centralized Logging Framework
    11. 15-11. Logging to a Centralized Framework When Using SSIS Containers
    12. 15-12. Logging to a Centralized Framework When Using SSIS Script Tasks and Components
    13. 15-13. Logging to a Text or XML File from T-SQL
    14. 15-14. Logging Counters in T-SQL
    15. 15-15. Logging Counters from SSIS
    16. 15-16. Creating an SSIS Catalog
    17. 15-17. Reading Logged Events and Counters from the SSIS Catalog
    18. 15-18. Analyzing Events and Counters In-Depth via the SSIS Catalog
    19. 15-19. Creating a Process Control Framework
    20. 15-20. Linking the SSIS Catalog to T-SQL Logging
    21. 15-21. Baselining ETL Processes
    22. 15-22. Auditing an ETL Process
    23. 15-23. Logging Audit Data
    24. Summary
  24. Appendix A: Data Types
    1. SQL Server Data Types
    2. SSIS Data Types
    3. Default Data Mapping in the Import/Export Wizard
    4. MSSQL9 to MSSQL8
    5. MSSQL to DB2
    6. MSSQL to IBMDB2
    7. MSSQL to Jet4
    8. MSSQL to SSIS11
    9. OracleClient to MSSQL
    10. OracleClient to MSSQL11
    11. OracleClient to SSIS11
    12. Oracle to MSSQL
    13. Oracle to MSSQL11
    14. Oracle to SSIS11
    15. SQLClient9 to MSSQL8
    16. SQLClient to DB2
    17. SQLClient to IBMDB2
    18. SQLClient to MSSQL11
    19. SQLClient to Oracle
    20. SQLClient to SSIS
    21. SSIS11 to DB2
    22. SSIS11 to IBMDB2
    23. SSIS11 to MSSQL
    24. DB2 to MSSQL
    25. SSIS to Jet
    26. DB2 to MSSQL11
    27. SSIS to Oracle
    28. DB2 to SSIS11
    29. IBMDB2 to MSSQL
    30. IBMDB2 to MSSQL11
    31. IBMDB2 to SSIS11
    32. Jet to MSSQL8
    33. Jet to SSIS
    34. ACE to SSIS
    35. Excel to SQL Server and SSIS Data Mapping
    36. Access to SQL Server and SSIS Data Mapping
    37. Oracle to SQL Server and SSIS Data Mapping
    38. Oracle to SQL Server Replication Data Type Mapping
    39. MySQL Data Types
    40. Sybase to SQL Server Data Type Conversion
  25. Appendix B: Sample Databases and Scripts
    1. Sample Databases and Files
    2. Directory Structure for the Sample Files
  26. Index