You are previewing Connecting the Data.
O'Reilly logo
Connecting the Data

Book Description

Business data integration is a complex problem that must be solved when organizations change or enhance their internal structures. The goal of this book is to present a simple yet thorough resource that describes the challenges of business data integration and the solutions to these challenges such as schema integration, illustrated through an Operational Data Store (ODS) case study.

This book contains three sections spanning ten chapters. Section I, Foundational Concepts, will provide you with the necessary basic concepts and discuss schema integration. Section II, Preparation and Design, introduces the case study and we will reverse engineer each of the data sources to create a set of data dictionary reports which will provide us with the meta data we need to apply the schema integration process. Section III, Physical Implementation, will present scripts to populate each of the source databases and spreadsheets and use reports to create Extract, Transform, and Load (ETL) specifications. The ten chapters within these three sections are:

  • Chapter 1 - Introduction and Roadmap

  • Chapter 2 - What is an Operational Data Store (ODS)?

  • Chapter 3 - What is Schema Integration?

  • Chapter 4 - The Role of the ODS within DW Architectures

  • Chapter 5 - Reverse Engineering the four Source Schema

  • Chapter 6 - Designing the Interim Schema

  • Chapter 7 - Preparing the ETL Specifications

  • Chapter 8 - Designing the Physical ODS Database Model

  • Chapter 9 - Designing Our ETL processes with SSIS

  • Chapter 10 - Data Quality Profiling

Table of Contents

  1. Cover Page
  2. Title Page
  3. Copyright Page
  4. Contents
  5. Chapter 1: Introduction and Roadmap
    1. What exactly is an ODS?
    2. A Guided Tour of the Book
    3. Introducing our Case Study Architecture
    4. Summary
  6. Chapter 2: What is an Operational Data Store?
    1. The High Level Operational Data Store Architecture
    2. The ODS Data Staging Layer
    3. ODS Data Profiling
    4. ODS Data Cleansing
    5. ODS Data Integration
    6. Making ODS Data Available to Other Systems
    7. Summary
  7. Chapter 3: What is Schema Integration?
    1. Binary Schema Integration – Technique 1
    2. Binary Schema Integration – Technique 2
    3. Binary Schema Integration – Technique 3
    4. Summary
  8. Chapter 4: Roles of the ODS within Data Warehouse Architectures
    1. As a Supplier for a Data Warehouse
    2. As a Supplier for Multiple Data Marts
    3. As a Member in a Distributed Architecture
    4. As a Source for Operational Reporting
    5. As a Source for Data Mining Applications
    6. As an Integrator of CMDB Data
    7. As a Data Quality Platform
    8. Summary
  9. Chapter 5: Reverse Engineering the Four Schema
    1. Reverse Engineering the Munich Database
    2. Reverse Engineering the London Database
    3. Reverse Engineering the Torino Database
    4. Reverse Engineering the Paris Database
    5. Summary
  10. Chapter 6: Designing the Interim Schema
    1. Reviewing the Data Models
    2. Defining the Integration Sequence
    3. Integration Sequence 1 – Analysis & Conflict Resolution for Customer
    4. Integration Sequence 2 – Analysis & Conflict Resolution for Product
    5. Integration Sequence 3 – Analysis & Conflict Resolution
    6. Integration Sequence 4 – Analysis & Conflict Resolution
    7. Integration Sequence 5 – Analysis & Conflict Resolution for Inventory Address
    8. Integration Sequence 6 – Analysis & Conflict Resolution for Order Line
    9. The Final Conceptual Model
    10. Summary
  11. Chapter 7: Preparing the ETL Specifications
    1. The Customer ETL Specifications
    2. The Product ETL Specifications
    3. The Inventory ETL Specifications
    4. The Order Header ETL Specifications
    5. The Order Line ETL Specifications
    6. Process Hierarchy Diagram
    7. Process Dependency Diagram
    8. Summary
  12. Chapter 8: Designing the Physical ODS Database Model
    1. Customer Physical Model and DDL
    2. Product Physical Model and DDL
    3. Inventory Physical Model and DDL
    4. Order Header Physical Model and DDL
    5. Order Line Physical Model and DDL
    6. Summary
  13. Chapter 9: Creating our ETL Processes with SSIS
    1. Implementation Strategy
    2. Loading the Staging Tables
    3. Loading the Reference Tables
    4. Loading Integrated Schema 1 (IS1)
    5. Loading Integrated Schema 2 (IS2)
    6. Loading Integrated Schema 3 (IS3)
    7. Deploying the Completed Packages
    8. Summary
  14. Chapter 10: Data Quality Profiling
    1. Data Profiling
    2. Fuzzy Matching
    3. Creating a Data Quality Report Model
    4. Creating a Data Quality Report
    5. Summary
  15. Index