You are previewing Enabling Real-time Analytics on IBM z Systems Platform.
O'Reilly logo
Enabling Real-time Analytics on IBM z Systems Platform

Book Description

Regarding online transaction processing (OLTP) workloads, IBM® z Systems™ platform, with IBM DB2®, data sharing, Workload Manager (WLM), geoplex, and other high-end features, is the widely acknowledged leader. Most customers now integrate business analytics with OLTP by running, for example, scoring functions from transactional context for real-time analytics or by applying machine-learning algorithms on enterprise data that is kept on the mainframe. As a result, IBM adds investment so clients can keep the complete lifecycle for data analysis, modeling, and scoring on z Systems control in a cost-efficient way, keeping the qualities of services in availability, security, reliability that z Systems solutions offer. Because of the changed architecture and tighter integration, IBM has shown, in a customer proof-of-concept, that a particular client was able to achieve an orders-of-magnitude improvement in performance, allowing that client’s data scientist to investigate the data in a more interactive process.

Open technologies, such as Predictive Model Markup Language (PMML) can help customers update single components instead of being forced to replace everything at once. As a result, you have the possibility to combine your preferred tool for model generation (such as SAS Enterprise Miner or IBM SPSS® Modeler) with a different technology for model scoring (such as Zementis, a company focused on PMML scoring). IBM SPSS Modeler is a leading data mining workbench that can apply various algorithms in data preparation, cleansing, statistics, visualization, machine learning, and predictive analytics. It has over 20 years of experience and continued development, and is integrated with z Systems. With IBM DB2 Analytics Accelerator 5.1 and SPSS Modeler 17.1, the possibility exists to do the complete predictive model creation including data transformation within DB2 Analytics Accelerator. So, instead of moving the data to a distributed environment, algorithms can be pushed to the data, using cost-efficient DB2 Accelerator for the required resource-intensive operations.

This IBM Redbooks® publication explains the overall z Systems architecture, how the components can be installed and customized, how the new IBM DB2 Analytics Accelerator loader can help efficient data loading for z Systems data and external data, how in-database transformation, in-database modeling, and in-transactional real-time scoring can be used, and what other related technologies are available.

This book is intended for technical specialists and architects, and data scientists who want to use the technology on the z Systems platform. Most of the technologies described in this book require IBM DB2 for z/OS®. For acceleration of the data investigation, data transformation, and data modeling process, DB2 Analytics Accelerator is required. Most value can be archived if most of the data already resides on z Systems platforms, although adding external data (like from social sources) poses no problem at all.

Table of Contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. IBM Redbooks promotions
  4. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  5. Chapter 1. Executive overview
    1. 1.1 Introduction
    2. 1.2 Real-time analytics
      1. 1.2.1 Business advantages
      2. 1.2.2 IT advantages
    3. 1.3 In-database analytics
      1. 1.3.1 Accelerated in-database transformation
      2. 1.3.2 Accelerated in-database predictive modeling
    4. 1.4 Enabling applications with machine learning capability
    5. 1.5 Value propositions
    6. 1.6 Related products
    7. 1.7 Use cases
      1. 1.7.1 Countering payment fraud and financial crimes
      2. 1.7.2 Insurance claims in-process payment analytics
      3. 1.7.3 Predictive customer intelligence
  6. Chapter 2. Analytics implementation on z Systems platform
    1. 2.1 Adding analytics to a mainframe data sharing environment
      1. 2.1.1 SPSS Modeler
      2. 2.1.2 DB2 Analytics Accelerator wrapper stored procedure
      3. 2.1.3 Using accelerator-only tables
    2. 2.2 Installation and customization
      1. 2.2.1 DB2 and DB2 Analytics Accelerator setup and installation
      2. 2.2.2 Required DB2 privileges for SPSS users
      3. 2.2.3 SPSS Modeler client
      4. 2.2.4 SPSS Modeler server
      5. 2.2.5 Data sources in SPSS
      6. 2.2.6 AQT_ANALYTICS_DATABASE variable
      7. 2.2.7 User management in the SPSS Modeler server
      8. 2.2.8 Installing SPSS Modeler scoring adapter for DB2 z/OS
    3. 2.3 Real-time analytics lifecycle
      1. 2.3.1 Swim lane diagram of in-database analytics lifecycle
      2. 2.3.2 Interaction between a DB2 DBA and a data scientist
      3. 2.3.3 Key strengths of various components
  7. Chapter 3. Data integration using IBM DB2 Analytics Accelerator Loader for z/OS
    1. 3.1 Functional overview
      1. 3.1.1 Loader v2.1 enhancements
      2. 3.1.2 Loader methods to move data
      3. 3.1.3 Components and interfaces
    2. 3.2 Getting started
      1. 3.2.1 Installation
      2. 3.2.2 Customization
      3. 3.2.3 Workload Management (WLM) performance goals
      4. 3.2.4 IBM z Systems Integrated Information Processor (zIIP)
      5. 3.2.5 z Systems advantage
      6. 3.2.6 Parallelism
    3. 3.3 Scenarios
      1. 3.3.1 ACCEL_LOAD_TASKS
      2. 3.3.2 Sequential Input IDAA_ONLY and IDAA_DUAL
      3. 3.3.3 Load RESUME
      4. 3.3.4 IBM DB2 Analytics Accelerator Loader image copy input
      5. 3.3.5 VSAM
    4. 3.4 System Management Facility (SMF)
  8. Chapter 4. Data transformation
    1. 4.1 Introduction
      1. 4.1.1 Accelerator-only table (AOT)
      2. 4.1.2 Enabling in-database processing on SPSS Modeler client
    2. 4.2 SQL pushback in SPSS Modeler
      1. 4.2.1 How SQL generation works
      2. 4.2.2 Where improvements can occur with IDT using Accelerator
    3. 4.3 Nodes supporting SQL generation for DB2 Accelerator
      1. 4.3.1 Source palette tab
      2. 4.3.2 Record Ops palette tab
      3. 4.3.3 Field Ops palette tab
      4. 4.3.4 Graphs palette tab
      5. 4.3.5 Database Modeling (Nuggets) palette tab
      6. 4.3.6 Output palette tab
      7. 4.3.7 Export palette tab
    4. 4.4 In-database analytics Processing effort by components
      1. 4.4.1 SPSS Modeler client (running on Windows)
      2. 4.4.2 SPSS Modeler server
      3. 4.4.3 DB2 for z/OS with Analytics Accelerator
      4. 4.4.4 Netezza based DB2 Analytics Accelerator for z/OS
    5. 4.5 SPSS data transformation stream
      1. 4.5.1 SQL Preview option to verify in-database processing
      2. 4.5.2 Why are my nodes purple
      3. 4.5.3 Enable cache at node level
    6. 4.6 Stream messages
    7. 4.7 Data transformation using DataStage
  9. Chapter 5. Data modeling using SPSS and DB2 Analytics Accelerator
    1. 5.1 Introduction
    2. 5.2 SPSS data modeling stream design
      1. 5.2.1 SPSS data modeling stream
      2. 5.2.2 Example of how to optimize pruning a decision tree
    3. 5.3 Adding record IDs to input tables
    4. 5.4 Combining in-database transformation on Accelerator withtraditional model creation
  10. Chapter 6. Model scoring
    1. 6.1 In-transactional scoring (single-record scoring)
    2. 6.2 In-database scoring and scoring adapter
      1. 6.2.1 Real-time scoring and near real-time scoring
      2. 6.2.2 Export PMML from SPSS model apply node
    3. 6.3 Batch scoring
      1. 6.3.1 Complete batch scoring example with accelerated in-databasepredictive modeling
      2. 6.3.2 Batch scoring using Scoring Adapter into accelerator-only tables (less efficient method)
      3. 6.3.3 Batch scoring using Scoring Adapter into z/OS DB2 base tables (more efficient method)
    4. 6.4 DMG and PMML
      1. 6.4.1 Zementis and in-application scoring
    5. 6.5 Scoring and smart decisions
      1. 6.5.1 Solution architecture
      2. 6.5.2 Operational Decision Manager overview
      3. 6.5.3 SPSS Collaboration and Deployment Services
      4. 6.5.4 SPSS Scoring Adaptor for DB2 z/OS
    6. 6.6 Examples of integration for scoring solutions using ODM on z/OS
      1. 6.6.1 Example: Calling the SPSS Scoring Adapter for IBM DB2 for z/OS
      2. 6.6.2 Example: Calling SPSS Collaboration and Deployment Services
  11. Chapter 7. Acceleration for descriptive analytics: IBM DB2 Analytics Accelerator value with IBM Cognos BI
    1. 7.1 Descriptive analytics
    2. 7.2 IBM Cognos Business Intelligence
    3. 7.3 IBM Cognos BI 10 and DB2 Analytics Accelerator and z13
  12. Chapter 8. Using R with the IBM DB2 Analytics Accelerator
    1. 8.1 Introduction to the R programming language
    2. 8.2 Prerequisites
    3. 8.3 Using RJDBC with the IBM DB2 Analytics Accelerator
      1. 8.3.1 RJDBC functions
    4. 8.4 Using RODBC with the IBM DB2 Analytics Accelerator
      1. 8.4.1 RODBC functions
    5. 8.5 Using accelerator-only tables (AOTs) with R
    6. 8.6 Using Netezza Analytics stored procedures
      1. 8.6.1 Basic concept
      2. 8.6.2 Helper stored procedures
      3. 8.6.3 Decision trees
      4. 8.6.4 Regression trees
      5. 8.6.5 K-means clustering
      6. 8.6.6 TwoStep clustering
      7. 8.6.7 Naive Bayes
    7. 8.7 Performance considerations
  13. Appendix A. ODBC connectivity between SPSS Modeler and DB2 for z/OS
    1. ODBC related terms and concepts
    2. ODBC Driver Manager
    3. ODBC gateway mode
    4. ODBC driver used by SPSS Modeler server
    5. Installation and configuration of IBM DB2 ODBC driver
    6. SPSS Modeler configuration for use of DB2 ODBC driver
    7. Installation and configuration of SPSS SDAP driver
  14. Appendix B. Job to install wrapper stored procedures
    1. JCL job sample
    2. IBM Netezza Analytics stored procedures
  15. Related publications
    1. IBM Redbooks
    2. Other publications
    3. Online resources
    4. Help from IBM
  16. Back cover