You are previewing IBM InfoSphere Streams: Accelerating Deployments with Analytic Accelerators.
O'Reilly logo
IBM InfoSphere Streams: Accelerating Deployments with Analytic Accelerators

Book Description

This IBM® Redbooks® publication describes visual development, visualization, adapters, analytics, and accelerators for IBM InfoSphere® Streams (V3), a key component of the IBM Big Data platform. Streams was designed to analyze data in motion, and can perform analysis on incredibly high volumes with high velocity, using a wide variety of analytic functions and data types.

The Visual Development environment extends Streams Studio with drag-and-drop development, provides round tripping with existing text editors, and is ideal for rapid prototyping. Adapters facilitate getting data in and out of Streams, and V3 supports WebSphere MQ, Apache Hadoop Distributed File System, and IBM InfoSphere DataStage. Significant analytics include the native Streams Processing Language, SPSS Modeler analytics, Complex Event Processing, TimeSeries Toolkit for machine learning and predictive analytics, Geospatial Toolkit for location-based applications, and Annotation Query Language for natural language processing applications. Accelerators for Social Media Analysis and Telecommunications Event Data Analysis sample programs can be modified to build production level applications.

Want to learn how to analyze high volumes of streaming data or implement systems requiring high performance across nodes in a cluster? Then this book is for you.

Please note that the additional material referenced in the text is not available from IBM.

Table of Contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Chapter 1. Introduction
    1. 1.1 The challenge of getting started
      1. 1.1.1 Accelerators and toolkits
    2. 1.2 Using this book
  5. Chapter 2. Application programming using Streams Studio
    1. 2.1 SPL Graphical Editor
      1. 2.1.1 Adding operators to the graph
      2. 2.1.2 Connecting operators
      3. 2.1.3 Layout options and Outline view
      4. 2.1.4 Defining global types and schemas
      5. 2.1.5 Editing operator details
      6. 2.1.6 Generating SPL code
    2. 2.2 Application development use cases
      1. 2.2.1 Design: Sketching an application
      2. 2.2.2 Implementing an application
      3. 2.2.3 Composites: Reusable subgraphs
      4. 2.2.4 Making changes: Refactoring
      5. 2.2.5 Program understanding
  6. Chapter 3. Visualizing stream data
    1. 3.1 Stream data, views, and charts
    2. 3.2 How data visualization works
    3. 3.3 Use cases
      1. 3.3.1 Debugging using Streams Studio
      2. 3.3.2 Application monitoring using Streams Console
      3. 3.3.3 Hints and tips
  7. Chapter 4. Analytics entirely with SPL
    1. 4.1 Volume, variety, and velocity
    2. 4.2 Model view controller (MVC), and Streams
    3. 4.3 Flow rate sensor: Example application we create
      1. 4.3.1 Sample output from our flow rate sensor
      2. 4.3.2 Flow rate sensor: First ingest, heartbeat generator
      3. 4.3.3 Flow rate sensor: Actual ingest, parsing semi-structured data
      4. 4.3.4 Flow rate sensor: React phase (simulated)
      5. 4.3.5 Flow rate sensor: User defined functions, (read-only) maps
      6. 4.3.6 Flow rate sensor: Analyze, read-write maps, windows, other
    4. 4.4 Conclusion, how to proceed
  8. Chapter 5. Streams and DataStage integration
    1. 5.1 Introduction to Streams processes
    2. 5.2 Runtime architecture
    3. 5.3 Metadata integration
      1. 5.3.1 Integration Setup Overview
    4. 5.4 Sample application
      1. 5.4.1 Importing Streams certificate to DataStage
      2. 5.4.2 Streams-to-DataStage application
    5. 5.5 DataStage Job design practices
  9. Chapter 6. Streams integration with IBM BigInsights
    1. 6.1 Streams and big data challenges
      1. 6.1.1 Application scenarios
      2. 6.1.2 Large scale data ingest
      3. 6.1.3 Bootstrap and enrichment
      4. 6.1.4 Adaptive analytics model
      5. 6.1.5 Complex social and entity related analysis
      6. 6.1.6 Application development
      7. 6.1.7 Application interactions
      8. 6.1.8 Enabling components
    2. 6.2 BigInsights summary
  10. Chapter 7. Complex event processing
    1. 7.1 The role of the CEP Toolkit
      1. 7.1.1 Adding the CEP Toolkit to your build path
    2. 7.2 Stock price watch example
      1. 7.2.1 First iteration of stock price watch application
      2. 7.2.2 Second iteration of stock price watch application
      3. 7.2.3 Third iteration of stock price watch application
  11. Chapter 8. WebSphere MQ, XMSSource, XMSSink
    1. 8.1 WebSphere MQ Server, Message Service Client installation
    2. 8.2 Making the WebSphere MQ resident objects
    3. 8.3 Setting Streams environment variables, and adding toolkits
    4. 8.4 The two Streams applications
    5. 8.5 Using JMS adapters from the Messaging Toolkit
      1. 8.5.1 Example use cases for the JMS adapters
      2. 8.5.2 Installing and configuring of WebSphere MQ
      3. 8.5.3 Installing and configuring for Apache ActiveMQ
      4. 8.5.4 Compiling and running sample applications
      5. 8.5.5 Sample applications
      6. 8.5.6 Verifying the results
  12. Chapter 9. XML, XMLParse, XPath, and xquery
    1. 9.1 Scenario 1: Flat, single-tier XML, XMLParse
    2. 9.2 Scenario 2: Multitiered (list data)
      1. 9.2.1 Second iteration, XMLParse
      2. 9.2.2 Third iteration, XMLParse
      3. 9.2.3 Fourth (final) iteration, XMLParse
    3. 9.3 The spl-schema-from-xml utility, generating XMLParse code
    4. 9.4 The xquery() function, including filter
    5. 9.5 CDATA: Topic that is not covered in detail
  13. Chapter 10. Geospatial Toolkit
    1. 10.1 Concepts
      1. 10.1.1 Moving objects and location-based services
      2. 10.1.2 Geospatial concepts and operations
    2. 10.2 Toolkit organization
      1. 10.2.1 Namespaces
      2. 10.2.2 Types and enumerations
      3. 10.2.3 Constructor functions
      4. 10.2.4 Accessor functions
      5. 10.2.5 Spatial production functions
      6. 10.2.6 Spatial relationship functions
      7. 10.2.7 Validation of input arguments
      8. 10.2.8 Metric conversion functions
    3. 10.3 A location-based scenario: tracking vehicles
      1. 10.3.1 A vehicle simulator
      2. 10.3.2 Geofencing: detecting entry and exit
      3. 10.3.3 Predictive geospatial analytics
    4. 10.4 Conclusion
  14. Chapter 11. TimeSeries Toolkit
    1. 11.1 Basics of time series analysis
      1. 11.1.1 Time series patterns
      2. 11.1.2 Detecting patterns using the TimeSeries Toolkit
    2. 11.2 Time series representation and operators overview
      1. 11.2.1 Time series representation
      2. 11.2.2 Control signals
      3. 11.2.3 Types of operators and overview
    3. 11.3 Preprocessing operators
      1. 11.3.1 ReSample operator
      2. 11.3.2 TSWindowing operator
      3. 11.3.3 IncrementalInterpolate operator
    4. 11.4 Analysis operators
      1. 11.4.1 DSPFilter operator
      2. 11.4.2 Fast Fourier Transform
      3. 11.4.3 Discrete Wavelet Transform operator
      4. 11.4.4 Seasonal Trend Decomposition operator
      5. 11.4.5 CrossCorrelate operator
      6. 11.4.6 Normalize operator
      7. 11.4.7 FunctionEvaluator operator
      8. 11.4.8 Distribution operator
    5. 11.5 Modeling operators
      1. 11.5.1 HoltWinters operator
      2. 11.5.2 ARIMA operator
      3. 11.5.3 FMPFilter operator
      4. 11.5.4 Kalman operator
      5. 11.5.5 GAM operators
      6. 11.5.6 GMM operator
      7. 11.5.7 LPC operator
      8. 11.5.8 VAR operator
      9. 11.5.9 RLSFilter operator
    6. 11.6 Time series functions
      1. 11.6.1 The generator functions
      2. 11.6.2 The Crosscorrelate function
      3. 11.6.3 The convolve function
      4. 11.6.4 The rms function
  15. Chapter 12. Developing Java primitive operators
    1. 12.1 Operator lifecycle
    2. 12.2 Threading in a Java operator
    3. 12.3 Creating a simple operator
      1. 12.3.1 Operator code layout
      2. 12.3.2 Defining the operator model
      3. 12.3.3 Implementing the operator in Java
      4. 12.3.4 Compiling the operator code
      5. 12.3.5 Testing the operator code
      6. 12.3.6 Creating an SPL application
      7. 12.3.7 Adding custom metrics
      8. 12.3.8 Implementing a tuple consumer operator
    4. 12.4 Java development using Streams Studio
  16. Chapter 13. Text Analytics, AQL
    1. 13.1 Overview text analytics, by example
    2. 13.2 Installing and configuring text analytics tools
      1. 13.2.1 Installing text analytics tools from Streams Studio installation
      2. 13.2.2 Installing text analytics tools from a BigInsights install
      3. 13.2.3 Configuring your Streams project, add a BigInsights project
    3. 13.3 First AQL example, Apache HTTP log file
      1. 13.3.1 Improving the first example, Apache HTTP log file
    4. 13.4 Guided team-based AQL and using AQL tools
    5. 13.5 Regular expressions (regex)
      1. 13.5.1 AQL tools to aid in regex development and debug
      2. 13.5.2 Regex directly inside Streams; no AQL required
    6. 13.6 Additional AQL objects and techniques
      1. 13.6.1 The kitchen sink AQL script and Streams application
    7. 13.7 Topics not covered
  17. Chapter 14. IBM Accelerator for Telecommunications Event Data Analytics V1.2
    1. 14.1 Overview of TEDA
    2. 14.2 Installing TEDA
    3. 14.3 Understanding concepts and terms
      1. 14.3.1 Application components
      2. 14.3.2 Application infrastructure
      3. 14.3.3 Configuration
      4. 14.3.4 Fault tolerance
    4. 14.4 Customizing TEDA
      1. 14.4.1 Workflow overview
      2. 14.4.2 Preparation
      3. 14.4.3 Defining the sample use case
      4. 14.4.4 Exercise 1: Basic setup
      5. 14.4.5 Exercise 2: Writing to the database
      6. 14.4.6 Exercise 3: Adding an aggregation operator
    5. 14.5 Conclusion
  18. Chapter 15. SPSS Toolkit
    1. 15.1 An overview of InfoSphere Streams and SPSS
      1. 15.1.1 Integrating InfoSphere Streams and SPSS
      2. 15.1.2 Roles and terminology
      3. 15.1.3 Example development process
    2. 15.2 Coordinating Data Analyst and Streams developer efforts
    3. 15.3 Building the predictive models
    4. 15.4 Configuring the SPSSScoring operator
    5. 15.5 Summary
  19. Appendix A. Additional material
    1. Locating the web material
    2. Using the web material
  20. Related publications
    1. IBM Redbooks
    2. Online resources
    3. Help from IBM
  21. Back cover