You are previewing Managing Data in Motion.
O'Reilly logo
Managing Data in Motion

Book Description

Managing Data in Motion describes techniques that have been developed for significantly reducing the complexity of managing system interfaces and enabling scalable architectures. Author April Reeve brings over two decades of experience to present a vendor-neutral approach to moving data between computing environments and systems. Readers will learn the techniques, technologies, and best practices for managing the passage of data between computer systems and integrating disparate data together in an enterprise environment.

The average enterprise's computing environment is comprised of hundreds to thousands computer systems that have been built, purchased, and acquired over time. The data from these various systems needs to be integrated for reporting and analysis, shared for business transaction processing, and converted from one format to another when old systems are replaced and new systems are acquired.

The management of the "data in motion" in organizations is rapidly becoming one of the biggest concerns for business and IT management. Data warehousing and conversion, real-time data integration, and cloud and "big data" applications are just a few of the challenges facing organizations and businesses today. Managing Data in Motion tackles these and other topics in a style easily understood by business and IT managers as well as programmers and architects.

  • Presents a vendor-neutral overview of the different technologies and techniques for moving data between computer systems including the emerging solutions for unstructured as well as structured data types
  • Explains, in non-technical terms, the architecture and components required to perform data integration
  • Describes how to reduce the complexity of managing system interfaces and enable a scalable data architecture that can handle the dimensions of "Big Data"

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Dedication
  6. Foreword
  7. Acknowledgements
  8. Biography
  9. Introduction
    1. What this book is about and why it’s necessary
    2. What the reader will learn
    3. Who should read this book
    4. How this book is organized
    5. Part 1: Introduction to data integration
    6. Part 2: Batch data integration
    7. Part 3: Real-time data integration
    8. Part 4: Big data integration
  10. Part 1: Introduction to Data Integration
    1. Chapter 1. The Importance of Data Integration
      1. The natural complexity of data interfaces
      2. The rise of purchased vendor packages
      3. Key enablement of big data and virtualization
    2. Chapter 2. What Is Data Integration?
      1. Data in motion
      2. Integrating into a common format—transforming data
      3. Migrating data from one system to another
      4. Moving data around the organization
      5. Pulling information from unstructured data
      6. Moving process to data
    3. Chapter 3. Types and Complexity of Data Integration
      1. The differences and similarities in managing data in motion and persistent data
      2. Batch data integration
      3. Real-time data integration
      4. Big data integration
      5. Data virtualization
    4. Chapter 4. The Process of Data Integration Development
      1. The data integration development life cycle
      2. Inclusion of business knowledge and expertise
  11. Part 2: Batch Data Integration
    1. Chapter 5. Introduction to Batch Data Integration
      1. What is batch data integration?
      2. Batch data integration life cycle
    2. Chapter 6. Extract, Transform, and Load
      1. What is ETL?
      2. Profiling
      3. Extract
      4. Staging
      5. Access layers
      6. Transform
      7. Load
    3. Chapter 7. Data Warehousing
      1. What is data warehousing?
      2. Layers in an enterprise data warehouse architecture
      3. Types of data to load in a data warehouse
    4. Chapter 8. Data Conversion
      1. What is data conversion?
      2. Data conversion life cycle
      3. Data conversion analysis
      4. Best practice data loading
      5. Improving source data quality
      6. Mapping to target
      7. Configuration data
      8. Testing and dependencies
      9. Private data
      10. Proving
      11. Environments
    5. Chapter 9. Data Archiving
      1. What is data archiving?
      2. Selecting data to archive
      3. Can the archived data be retrieved?
      4. Conforming data structures in the archiving environment
      5. Flexible data structures
    6. Chapter 10. Batch Data Integration Architecture and Metadata
      1. What is batch data integration architecture?
      2. Profiling tool
      3. Modeling tool
      4. Metadata repository
      5. Data movement
      6. Transformation
      7. Scheduling
  12. Part 3: Real Time Data Integration
    1. Chapter 11. Introduction to Real-Time Data Integration
      1. Why real-time data integration?
      2. Why two sets of technologies?
    2. Chapter 12. Data Integration Patterns
      1. Interaction patterns
      2. Loose coupling
      3. Hub and spoke
      4. Synchronous and asynchronous interaction
      5. Request and reply
      6. Publish and subscribe
      7. Two-phase commit
      8. Integrating interaction types
    3. Chapter 13. Core Real-Time Data Integration Technologies
      1. Confusing terminology
      2. Enterprise service bus (ESB)
      3. Service-oriented architecture (SOA)
      4. Extensible markup language (XML)
      5. Data replication and change data capture
      6. Enterprise application integration (EAI)
      7. Enterprise information integration (EII)
    4. Chapter 14. Data Integration Modeling
      1. Canonical modeling
      2. Message modeling
    5. Chapter 15. Master Data Management
      1. Introduction to master data management
      2. Reasons for a master data management solution
      3. Purchased packages and master data
      4. Reference data
      5. Masters and slaves
      6. External data
      7. Master data management functionality
      8. Types of master data management solutions—registry and data hub
    6. Chapter 16. Data Warehousing with Real-Time Updates
      1. Corporate information factory
      2. Operational data store
      3. Master data moving to the data warehouse
    7. Chapter 17. Real-Time Data Integration Architecture and Metadata
      1. What is real-time data integration metadata?
      2. Modeling
      3. Profiling
      4. Metadata repository
      5. Enterprise service bus—data transformation and orchestration
      6. Data movement and middleware
      7. External interaction
  13. Part 4: Big, Cloud, Virtual Data
    1. Chapter 18. Introduction to Big Data Integration
      1. Data integration and unstructured data
      2. Big data, cloud data, and data virtualization
    2. Chapter 19. Cloud Architecture and Data Integration
      1. Why is data integration important in the cloud?
      2. Public cloud
      3. Cloud security
      4. Cloud latency
      5. Cloud redundancy
    3. Chapter 20. Data Virtualization
      1. A technology whose time has come
      2. Business uses of data virtualization
      3. Data virtualization architecture
    4. Chapter 21. Big Data Integration
      1. What is big data?
      2. Big data dimension—volume
      3. Big data dimension—variety
      4. Big data dimension—velocity
      5. Traditional big data use cases
      6. More big data use cases
      7. Leveraging the power of big data—real-time decision support
      8. Big data architecture
    5. Chapter 22. Conclusion to Managing Data in Motion
      1. Data integration architecture
      2. Data integration engines
      3. Data integration hubs
      4. Metadata management
      5. The end
  14. References
  15. Index