You are previewing Smarter Business: Dynamic Information with IBM InfoSphere Data Replication CDC.
O'Reilly logo
Smarter Business: Dynamic Information with IBM InfoSphere Data Replication CDC

Book Description

To make better informed business decisions, better serve clients, and increase operational efficiencies, you must be aware of changes to key data as they occur. In addition, you must enable the immediate delivery of this information to the people and processes that need to act upon it. This ability to sense and respond to data changes is fundamental to dynamic warehousing, master data management, and many other key initiatives. A major challenge in providing this type of environment is determining how to tie all the independent systems together and process the immense data flow requirements. IBM® InfoSphere® Change Data Capture (InfoSphere CDC) can respond to that challenge, providing programming-free data integration, and eliminating redundant data transfer, to minimize the impact on production systems.

In this IBM Redbooks® publication, we show you examples of how InfoSphere CDC can be used to implement integrated systems, to keep those systems updated immediately as changes occur, and to use your existing infrastructure and scale up as your workload grows. InfoSphere CDC can also enhance your investment in other software, such as IBM DataStage® and IBM QualityStage®, IBM InfoSphere Warehouse, and IBM InfoSphere Master Data Management Server, enabling real-time and event-driven processes. Enable the integration of your critical data and make it immediately available as your business needs it.

Table of Contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. The team who wrote this book
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Chapter 1. Introduction and overview
    1. 1.1 Optimized data integration
    2. 1.2 InfoSphere architecture
  5. Chapter 2. InfoSphere CDC: Empowering information management
    1. 2.1 The need for dynamic data
    2. 2.2 Data delivery methods
    3. 2.3 Providing dynamic data with InfoSphere CDC
      1. 2.3.1 InfoSphere CDC architectural overview
      2. 2.3.2 Reliability and integrity
  6. Chapter 3. Business use cases for InfoSphere CDC
    1. 3.1 InfoSphere CDC techniques for transporting changed data
      1. 3.1.1 Option 1: Database staging
      2. 3.1.2 Option 2: Message queue (MQ) based integration
      3. 3.1.3 Option 3: File-based integration
      4. 3.1.4 Option 4: InfoSphere DataStage Direct Connect
    2. 3.2 Data warehousing and business intelligence
      1. 3.2.1 Active data warehousing
      2. 3.2.2 Slowly changing dimensions
    3. 3.3 Consolidation
      1. 3.3.1 Consolidation: Sample implementation 1
      2. 3.3.2 Consolidation: Sample implementation 2
    4. 3.4 Distribution
      1. 3.4.1 Distribution: Sample implementation 1
      2. 3.4.2 Distribution: Sample implementation 2
    5. 3.5 Database migration
      1. 3.5.1 Database migration: Sample implementation
    6. 3.6 Application integration
      1. 3.6.1 Application integration: Sample implementation 1
      2. 3.6.2 Application integration: Sample implementation 2
      3. 3.6.3 Application integration: Sample implementation 3
    7. 3.7 Integration with master data management
      1. 3.7.1 Integration with master data management: Sample implementation
    8. 3.8 Integration with IBM Information Server
      1. 3.8.1 Integration with IBM Information Server: Sample implementation
    9. 3.9 Operational business intelligence
      1. 3.9.1 Operational business intelligence: Sample implementation
  7. Chapter 4. Solution topologies
    1. 4.1 Unidirectional replication
    2. 4.2 Cascading replication
    3. 4.3 Bidirectional replication
    4. 4.4 Consolidation replication
    5. 4.5 Data distribution replication
    6. 4.6 Hub-and-Spoke replication with propagation
    7. 4.7 Destination
      1. 4.7.1 JMS Message Queue
      2. 4.7.2 Flat files
      3. 4.7.3 DataStage
      4. 4.7.4 Web services
  8. Chapter 5. InfoSphere CDC features and functionality
    1. 5.1 Transformations
      1. 5.1.1 Column functions
      2. 5.1.2 Journal control fields
      3. 5.1.3 Joining
      4. 5.1.4 User exits for customizations
      5. 5.1.5 Considerations for using transformational functionality
    2. 5.2 Replication modes
      1. 5.2.1 Refresh
      2. 5.2.2 Continuous mirroring
      3. 5.2.3 Scheduled end (net change)
    3. 5.3 Filtering
      1. 5.3.1 Row level
      2. 5.3.2 Column level
    4. 5.4 Apply methods
      1. 5.4.1 Standard
      2. 5.4.2 LiveAudit
      3. 5.4.3 Adaptive Apply
      4. 5.4.4 Summarization
      5. 5.4.5 Row consolidation
      6. 5.4.6 Soft deletes
      7. 5.4.7 Custom apply methods (user exits)
      8. 5.4.8 Flat files
      9. 5.4.9 DataStage direct connect
      10. 5.4.10 JMS message queues
    5. 5.5 Conflict detection and resolution
  9. Chapter 6. Understanding the architecture
    1. 6.1 Component overview
      1. 6.1.1 InfoSphere CDC instances
      2. 6.1.2 Interoperability between the InfoSphere CDC components
    2. 6.2 Management Console fundamentals
      1. 6.2.1 Access Manager Interface
      2. 6.2.2 Configuration Interface
      3. 6.2.3 Monitoring Interface
      4. 6.2.4 InfoSphere CDC API
      5. 6.2.5 Access Server fundamentals
    3. 6.3 The InfoSphere CDC engine
      1. 6.3.1 Bookmarks
      2. 6.3.2 The InfoSphere CDC Linux, UNIX, and Windows engine
      3. 6.3.3 The InfoSphere CDC for System i engine
      4. 6.3.4 The InfoSphere CDC for z/OS engine
    4. 6.4 Communications between source and target
    5. 6.5 Summary
  10. Chapter 7. Environmental considerations
    1. 7.1 Globalization with InfoSphere CDC
      1. 7.1.1 Time zone considerations
      2. 7.1.2 Encoding conversions
    2. 7.2 Firewall configurations
      1. 7.2.1 How InfoSphere CDC uses TCP/IP
      2. 7.2.2 Firewalls
      3. 7.2.3 InfoSphere CDC in a firewalled network environment
      4. 7.2.4 Configuring source port restrictions
      5. 7.2.5 Troubleshooting CDC connection issues
    3. 7.3 Log retention
      1. 7.3.1 Log retention general guidelines
      2. 7.3.2 Log retention platform-specific guidelines
    4. 7.4 Remote processing capabilities
      1. 7.4.1 Remote source
      2. 7.4.2 Remote target
      3. 7.4.3 Remote source and target
      4. 7.4.4 Log shipping
    5. 7.5 Using InfoSphere CDC in resilient environments
      1. 7.5.1 InfoSphere CDC reachability: Virtual IP
      2. 7.5.2 InfoSphere CDC binary files and metadata for the Linux, UNIX, and Windows engine
      3. 7.5.3 InfoSphere CDC on a shared volume
      4. 7.5.4 InfoSphere CDC on separate nodes with a shared database
      5. 7.5.5 InfoSphere CDC on separate servers with separate databases
      6. 7.5.6 System i resilient environments
      7. 7.5.7 z/OS / Sysplex and InfoSphere CDC in resilient environments
    6. 7.6 Change management
      1. 7.6.1 Understanding InfoSphere CDC bookmarks
      2. 7.6.2 Change Management sample environment
      3. 7.6.3 DDL changes in a service window
      4. 7.6.4 DDL changes without a service window
  11. Chapter 8. Performance analysis and design considerations
    1. 8.1 High volume between two systems
      1. 8.1.1 Latency and throughput
      2. 8.1.2 InfoSphere CDC architecture
    2. 8.2 Identification of potential bottlenecks
    3. 8.3 Performance monitoring in InfoSphere CDC environments
      1. 8.3.1 Performance monitoring using the Management Console
      2. 8.3.2 System monitoring tools
    4. 8.4 Using workflow for performance issues
    5. 8.5 Installation considerations
      1. 8.5.1 Silent installations and instance creation
    6. 8.6 Design considerations
      1. 8.6.1 Using multiple parallel subscriptions
      2. 8.6.2 Using multiple InfoSphere CDC instances
      3. 8.6.3 Using an n-tiered architecture
      4. 8.6.4 Using cascading replication to spread the workload
      5. 8.6.5 Continuous scraping
  12. Chapter 9. Customization and automation
    1. 9.1 Options for managing InfoSphere CDC
    2. 9.2 Management Console GUI
    3. 9.3 Management Console commands
      1. 9.3.1 Common uses for the Management Console commands
      2. 9.3.2 Compiling Management Console command scripts
    4. 9.4 InfoSphere CDC engine commands (CLI)
      1. 9.4.1 Running commands for the Linux, UNIX, and Windows engine
      2. 9.4.2 Running CL commands for System i
      3. 9.4.3 Running console commands for IBM System z
      4. 9.4.4 Sample scripts
      5. 9.4.5 Checking an InfoSphere CDC engine and subscriptions activity
      6. 9.4.6 Removing obsolete database logs
    5. 9.5 InfoSphere CDC API
      1. 9.5.1 Development environment setup
      2. 9.5.2 Contents of the api.jar file
      3. 9.5.3 Connecting to and managing the Access Server
      4. 9.5.4 Connecting to the data stores
      5. 9.5.5 Configuring InfoSphere CDC replication
      6. 9.5.6 Creating a subscription
      7. 9.5.7 Procedure for mapping tables
      8. 9.5.8 Table mapping example
      9. 9.5.9 Procedure for removing mapped tables
      10. 9.5.10 Table mapping removal example
      11. 9.5.11 Row and column filtering
      12. 9.5.12 Derived columns
      13. 9.5.13 Encoding conversions (before and after Version 6.5)
      14. 9.5.14 Operations and user exits
      15. 9.5.15 Common procedures (updating table definitions)
      16. 9.5.16 Deploying subscription changes and considerations
      17. 9.5.17 Starting, stopping, and monitoring subscriptions
      18. 9.5.18 Monitoring latency
      19. 9.5.19 Monitoring event logs using the API
    6. 9.6 Monitoring and integration with external monitoring solutions
      1. 9.6.1 Components to monitor
      2. 9.6.2 InfoSphere CDC instance activity
      3. 9.6.3 Subscription activity
      4. 9.6.4 Events
      5. 9.6.5 Latency
    7. 9.7 User exits
      1. 9.7.1 Common uses for user exits
      2. 9.7.2 User exit programs
      3. 9.7.3 Derived expression user exits
      4. 9.7.4 Table and row-level user exits
      5. 9.7.5 Subscription-level (unit of work)
      6. 9.7.6 Java user exit for flat file custom formatter
      7. 9.7.7 Notifications
  13. Appendix A. Single scrape events and errors
    1. Single scrape error events
  14. Appendix B. Additional material
    1. Locating the web material
    2. Using the web material
  15. Glossary
  16. Related publications
    1. IBM Redbooks
    2. Online resources
    3. Help from IBM
  17. Back cover