You are previewing Addressing Data Volume, Velocity, and Variety with IBM InfoSphere Streams V3.0.
O'Reilly logo
Addressing Data Volume, Velocity, and Variety with IBM InfoSphere Streams V3.0

Book Description

There are multiple uses for big data in every industry—from analyzing larger volumes of data than was previously possible to driving more precise answers, to analyzing data at rest and data in motion to capture opportunities that were previously lost. A big data platform will enable your organization to tackle complex problems that previously could not be solved using traditional infrastructure.

As the amount of data available to enterprises and other organizations dramatically increases, more and more companies are looking to turn this data into actionable information and intelligence in real time. Addressing these requirements requires applications that are able to analyze potentially enormous volumes and varieties of continuous data streams to provide decision makers with critical information almost instantaneously.

IBM® InfoSphere® Streams provides a development platform and runtime environment where you can develop applications that ingest, filter, analyze, and correlate potentially massive volumes of continuous data streams based on defined, proven, and analytical rules that alert you to take appropriate action, all within an appropriate time frame for your organization.

This IBM Redbooks® publication is written for decision-makers, consultants, IT architects, and IT professionals who will be implementing a solution with IBM InfoSphere Streams.

Table of Contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. The team who wrote this book
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Chapter 1. Think big with big data
    1. 1.1 Executive summary
    2. 1.2 What is big data?
    3. 1.3 IBM big data strategy
    4. 1.4 IBM big data platform
      1. 1.4.1 InfoSphere BigInsights for Hadoop-based analytics
      2. 1.4.2 InfoSphere Streams for low-latency analytics
      3. 1.4.3 InfoSphere Information Server for Data Integration
      4. 1.4.4 Netezza, InfoSphere Warehouse, and Smart Analytics System for deep analytics
    5. 1.5 Summary
  5. Chapter 2. Exploring IBM InfoSphere Streams
    1. 2.1 Stream computing
      1. 2.1.1 Business landscape
      2. 2.1.2 Information environment
      3. 2.1.3 The evolution of analytics
    2. 2.2 IBM InfoSphere Streams
      1. 2.2.1 Overview of Streams
      2. 2.2.2 Why use Streams
      3. 2.2.3 Examples of Streams implementations
  6. Chapter 3. InfoSphere Streams architecture
    1. 3.1 Streams concepts and terms
      1. 3.1.1 Working with continuous data flow
      2. 3.1.2 Component overview
    2. 3.2 InfoSphere Streams runtime system
      1. 3.2.1 SPL components
      2. 3.2.2 Runtime components
      3. 3.2.3 InfoSphere Streams runtime files
    3. 3.3 Performance requirements
      1. 3.3.1 InfoSphere Streams reference architecture
    4. 3.4 High availability
  7. Chapter 4. IBM InfoSphere Streams V3.0 new features
    1. 4.1 New configuration features
      1. 4.1.1 Enhanced first steps after installation
    2. 4.2 Development
    3. 4.3 Administration
      1. 4.3.1 Improved visual application monitoring
      2. 4.3.2 Streams data visualization
      3. 4.3.3 Streams console application launcher
    4. 4.4 Integration
      1. 4.4.1 DataStage integration
      2. 4.4.2 Netezza integration
      3. 4.4.3 Data Explorer integration
      4. 4.4.4 SPSS integration
      5. 4.4.5 XML integration
    5. 4.5 Analytics and accelerators toolkits
      1. 4.5.1 Geospatial toolkit
      2. 4.5.2 Time series toolkit
      3. 4.5.3 Complex Event Processing toolkit
      4. 4.5.4 Accelerators toolkit
  8. Chapter 5. InfoSphere Streams deployment
    1. 5.1 Architecture, instances, and topologies
      1. 5.1.1 Runtime architecture
      2. 5.1.2 Streams instances
      3. 5.1.3 Deployment topologies
    2. 5.2 Streams runtime deployment planning
      1. 5.2.1 Streams environment
      2. 5.2.2 Sizing the environment
      3. 5.2.3 Deployment and installation checklists
    3. 5.3 Streams instance creation and configuration
      1. 5.3.1 Streams shared instance configuration
      2. 5.3.2 Streams private developer instance configuration
    4. 5.4 Application deployment capabilities
      1. 5.4.1 Dynamic application composition
      2. 5.4.2 Operator host placement
      3. 5.4.3 Operator partitioning
      4. 5.4.4 Parallelizing operators
    5. 5.5 Failover, availability, and recovery
      1. 5.5.1 Restarting and relocating processing elements
      2. 5.5.2 Recovering application hosts
      3. 5.5.3 Recovering management hosts
  9. Chapter 6. Application development with Streams Studio
    1. 6.1 InfoSphere Streams Studio overview
    2. 6.2 Developing applications with Streams Studio
      1. 6.2.1 Adding toolkits to Streams Explorer
      2. 6.2.2 Using Streams Explorer
      3. 6.2.3 Creating a simple application with Streams Studio
      4. 6.2.4 Build and launch the application
      5. 6.2.5 Monitor your application
    3. 6.3 Streams Processing Language
      1. 6.3.1 Structure of an SPL program file
      2. 6.3.2 Streams data types
      3. 6.3.3 Stream schemas
      4. 6.3.4 Streams punctuation markers
      5. 6.3.5 Streams windows
    4. 6.4 InfoSphere Streams operators
      1. 6.4.1 Adapter operators
      2. 6.4.2 Relational operators
      3. 6.4.3 Utility operators
      4. 6.4.4 XML operators
      5. 6.4.5 Compact operators
    5. 6.5 InfoSphere Streams toolkits
      1. 6.5.1 Mining toolkit
      2. 6.5.2 Financial toolkit
      3. 6.5.3 Database toolkit
      4. 6.5.4 Internet toolkit
      5. 6.5.5 Geospatial toolkit
      6. 6.5.6 TimeSeries toolkit
      7. 6.5.7 Complex Event Processing toolkit
      8. 6.5.8 Accelerators toolkit
  10. Chapter 7. Streams integration considerations
    1. 7.1 Integrating with IBM InfoSphere BigInsights
      1. 7.1.1 Streams and BigInsights application scenario
      2. 7.1.2 Scalable data ingest
      3. 7.1.3 Bootstrap and enrichment
      4. 7.1.4 Adaptive analytics model
      5. 7.1.5 Streams and BigInsights application development
      6. 7.1.6 Application interactions
      7. 7.1.7 Enabling components
      8. 7.1.8 BigInsights summary
    2. 7.2 Integration with IBM SPSS
      1. 7.2.1 Value of integration
      2. 7.2.2 SPSS operators overview
      3. 7.2.3 SPSSScoring operator
      4. 7.2.4 SPSSPublish operator
      5. 7.2.5 SPSSRepository operator
      6. 7.2.6 Streams and SPSS examples
    3. 7.3 Integrating with databases
      1. 7.3.1 Concepts
      2. 7.3.2 Configuration files and connection documents
      3. 7.3.3 Database specifics
      4. 7.3.4 Examples
    4. 7.4 Streams and DataStage
      1. 7.4.1 Integration scenarios
      2. 7.4.2 Application considerations
      3. 7.4.3 Streams to DataStage metadata import
      4. 7.4.4 Connector stage
      5. 7.4.5 Sample DataStage job
      6. 7.4.6 Configuration steps in a Streams server environment
      7. 7.4.7 DataStage adapters
    5. 7.5 Integrating with XML data
      1. 7.5.1 XMLParse Operator
      2. 7.5.2 XMLParse example
    6. 7.6 Integration with IBM WebSphere MQ and WebSphere MQ Low Latency Messaging
      1. 7.6.1 Architecture
      2. 7.6.2 Use cases
      3. 7.6.3 Configuration setup for WebSphere MQ Low Latency Messaging in Streams
      4. 7.6.4 Connection specification for WebSphere MQ
      5. 7.6.5 WebSphere MQ operators
      6. 7.6.6 Software requirements
    7. 7.7 Integration with Data Explorer
      1. 7.7.1 Usage
  11. Chapter 8. IBM InfoSphere Streams administration
    1. 8.1 InfoSphere Streams Instance Management
      1. 8.1.1 Instances Manager
      2. 8.1.2 Creating and configuring an instance
    2. 8.2 InfoSphere Streams Console
      1. 8.2.1 Starting the InfoSphere Streams Console
    3. 8.3 Instance administration
      1. 8.3.1 Hosts
      2. 8.3.2 Permissions
      3. 8.3.3 Applications
      4. 8.3.4 Jobs
      5. 8.3.5 Processing Elements
      6. 8.3.6 Operators
      7. 8.3.7 Application streams
      8. 8.3.8 Views
      9. 8.3.9 Charts
      10. 8.3.10 Application Graph
    4. 8.4 Settings
      1. 8.4.1 General
      2. 8.4.2 Hosts Settings
      3. 8.4.3 Security
      4. 8.4.4 Host tags
      5. 8.4.5 Web Server
      6. 8.4.6 Logging and Tracing
    5. 8.5 Streams Recovery Database
  12. Appendix A. Installing InfoSphere Streams
    1. Hardware requirements for InfoSphere Streams
    2. Software requirements for InfoSphere Streams
    3. Required Red Hat Package Manager for InfoSphere Streams
    4. Installing InfoSphere Streams
    5. IBM InfoSphere Streams First Steps configuration
  13. Appendix B. IBM InfoSphere Streams security considerations
    1. Security-Enhanced Linux for Streams
    2. User authentication
    3. User authorization
    4. Audit log file for Streams
  14. Appendix C. Commodity purchasing application demonstration
    1. Application overview
    2. Application demonstration
  15. Glossary
  16. Related publications
    1. IBM Redbooks
    2. Other publications
    3. Online resources
    4. Help from IBM
  17. Back cover