You are previewing Principles of Data Integration.
O'Reilly logo
Principles of Data Integration

Book Description

How do you approach answering queries when your data is stored in multiple databases that were designed independently by different people? This is first comprehensive book on data integration and is written by three of the most respected experts in the field.

This book provides an extensive introduction to the theory and concepts underlying today's data integration techniques, with detailed, instruction for their application using concrete examples throughout to explain the concepts. Data integration is the problem of answering queries that span multiple data sources (e.g., databases, web pages). Data integration problems surface in multiple contexts, including enterprise information integration, query processing on the Web, coordination between government agencies and collaboration between scientists. In some cases, data integration is the key bottleneck to making progress in a field.

The authors provide a working knowledge of data integration concepts and techniques, giving you the tools you need to develop a complete and concise package of algorithms and applications.



*Offers a range of data integration solutions enabling you to focus on what is most relevant to the problem at hand.

*Enables you to build your own algorithms and implement your own data integration applications

*Companion website with numerous project-based exercises and solutions and slides. Links to commercially available software allowing readers to build their own algorithms and implement their own data integration applications. Facebook page for reader input during and after publication.

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Dedication
  6. Preface
  7. 1. Introduction
    1. 1.1 What Is Data Integration?
    2. 1.2 Why Is It Hard?
    3. 1.3 Data Integration Architectures
    4. 1.4 Outline of the Book
    5. Bibliographic Notes
  8. Part I: Foundational Data Integration Techniques
    1. 2. Manipulating Query Expressions
      1. 2.1 Review of Database Concepts
      2. 2.2 Query Unfolding
      3. 2.3 Query Containment and Equivalence
      4. 2.4 Answering Queries Using Views
      5. Bibliographic Notes
    2. 3. Describing Data Sources
      1. 3.1 Overview and Desiderata
      2. 3.2 Schema Mapping Languages
      3. 3.3 Access-Pattern Limitations
      4. 3.4 Integrity Constraints on the Mediated Schema
      5. 3.5 Answer Completeness
      6. 3.6 Data-Level Heterogeneity
      7. Bibliographic Notes
    3. 4. String Matching
      1. 4.1 Problem Description
      2. 4.2 Similarity Measures
      3. 4.3 Scaling Up String Matching
      4. Bibliographic Notes
    4. 5. Schema Matching and Mapping
      1. 5.1 Problem Definition
      2. 5.2 Challenges of Schema Matching and Mapping
      3. 5.3 Overview of Matching and Mapping Systems
      4. 5.4 Matchers
      5. 5.5 Combining Match Predictions
      6. 5.6 Enforcing Domain Integrity Constraints
      7. 5.7 Match Selector
      8. 5.8 Reusing Previous Matches
      9. 5.9 Many-to-Many Matches
      10. 5.10 From Matches to Mappings
      11. Bibliographic Notes
    5. 6. General Schema Manipulation Operators
      1. 6.1 Model Management Operators
      2. 6.2 Merge
      3. 6.3 ModelGen
      4. 6.4 Invert
      5. 6.5 Toward Model Management Systems
      6. 6.5 Bibliographic Notes
    6. 7. Data Matching
      1. 7.1 Problem Definition
      2. 7.2 Rule-Based Matching
      3. 7.3 Learning-Based Matching
      4. 7.4 Matching by Clustering
      5. 7.5 Probabilistic Approaches to Data Matching
      6. 7.6 Collective Matching
      7. 7.7 Scaling Up Data Matching
      8. Bibliographic Notes
    7. 8. Query Processing
      1. 8.1 Background: DBMS Query Processing
      2. 8.2 Background: Distributed Query Processing
      3. 8.3 Query Processing for Data Integration
      4. 8.4 Generating Initial Query Plans
      5. 8.5 Query Execution for Internet Data
      6. 8.6 Overview of Adaptive Query Processing
      7. 8.7 Event-Driven Adaptivity
      8. 8.8 Performance-Driven Adaptivity
      9. Bibliographic Notes
    8. 9. Wrappers
      1. 9.1 Introduction
      2. 9.2 Manual Wrapper Construction
      3. 9.3 Learning-Based Wrapper Construction
      4. 9.4 Wrapper Learning without Schema
      5. 9.5 Interactive Wrapper Construction
      6. Bibliographic Notes
    9. 10. Data Warehousing and Caching
      1. 10.1 Data Warehousing
      2. 10.2 Data Exchange: Declarative Warehousing
      3. 10.3 Caching and Partial Materialization
      4. 10.4 Direct Analysis of Local, External Data
      5. Bibliographic Notes
  9. Part II: Integration with Extended Data Representations
    1. 11. XML
      1. 11.1 Data Model
      2. 11.2 XML Structural and Schema Definitions
      3. 11.3 Query Language
      4. 11.4 Query Processing for XML
      5. 11.5 Schema Mapping for XML
      6. Bibliographic Notes
    2. 12. Ontologies and Knowledge Representation
      1. 12.1 Example: Using KR in Data Integration
      2. 12.2 Description Logics
      3. 12.3 The Semantic Web
      4. Bibliographic Notes
    3. 13. Incorporating Uncertainty into Data Integration
      1. 13.1 Representing Uncertainty
      2. 13.2 Modeling Uncertain Schema Mappings
      3. 13.3 Uncertainty and Data Provenance
      4. Bibliographic Notes
    4. 14. Data Provenance
      1. 14.1 The Two Views of Provenance
      2. 14.2 Applications of Data Provenance
      3. 14.3 Provenance Semirings
      4. 14.4 Storing Provenance
      5. Bibliographic Notes
  10. Part III: Novel Integration Architectures
    1. 15. Data Integration on the Web
      1. 15.1 What Can We Do with Web Data?
      2. 15.2 The Deep Web
      3. 15.3 Topical Portals
      4. 15.4 Lightweight Combination of Web Data
      5. 15.5 Pay-as-You-Go Data Management
      6. Bibliographic Notes
    2. 16. Keyword Search
      1. 16.1 Keyword Search over Structured Data
      2. 16.2 Computing Ranked Results
      3. 16.3 Keyword Search for Data Integration
      4. Bibliographic Notes
    3. 17. Peer-to-Peer Integration
      1. 17.1 Peers and Mappings
      2. 17.2 Semantics of Mappings
      3. 17.3 Complexity of Query Answering in PDMS
      4. 17.4 Query Reformulation Algorithm
      5. 17.5 Composing Mappings
      6. 17.6 Peer Data Management with Looser Mappings
      7. Bibliographic Notes
    4. 18. Integration in Support of Collaboration
      1. 18.1 What Makes Collaboration Different
      2. 18.2 Processing Corrections and Feedback
      3. 18.3 Collaborative Annotation and Presentation
      4. 18.4 Dynamic Data: Collaborative Data Sharing
      5. Bibliographic Notes
    5. 19. The Future of Data Integration
      1. 19.1 Uncertainty, Provenance, and Cleaning
      2. 19.2 Crowdsourcing and “Human Computing"
      3. 19.3 Building Large-Scale Structured Web Databases
      4. 19.4 Lightweight Integration
      5. 19.5 Visualizing Integrated Data
      6. 19.6 Integrating Social Media
      7. 19.7 Cluster- and Cloud-Based Parallel Processing and Caching
  11. Bibliography
  12. Index