You are previewing IBM Watson Content Analytics: Discovering Actionable Insight from Your Content.
O'Reilly logo
IBM Watson Content Analytics: Discovering Actionable Insight from Your Content

Book Description

IBM® Watson™ Content Analytics (Content Analytics) Version 3.0 (formerly known as IBM Content Analytics with Enterprise Search (ICAwES)) helps you to unlock the value of unstructured content to gain new actionable business insight and provides the enterprise search capability all in one product. Content Analytics comes with a set of tools and a robust user interface to empower you to better identify new revenue opportunities, improve customer satisfaction, detect problems early, and improve products, services, and offerings.

To help you gain the most benefits from your unstructured content, this IBM Redbooks® publication provides in-depth information about the features and capabilities of Content Analytics, how the content analytics works, and how to perform effective and efficient content analytics on your content to discover actionable business insights.

This book covers key concepts in content analytics, such as facets, frequency, deviation, correlation, trend, and sentimental analysis. It describes the content analytics miner, and guides you on performing content analytics using views, dictionary lookup, and customization. The book also covers using IBM Content Analytics Studio for domain-specific content analytics, integrating with IBM Content Classification to get categories and new metadata, and interfacing with IBM Cognos® Business Intelligence (BI) to add values in BI reporting and analysis, and customizing the content analytics miner with APIs. In addition, the book describes how to use the enterprise search capability for the discovery and retrieval of documents using various query and visual navigation techniques, and customization of crawling, parsing, indexing, and runtime search to improve search results.

The target audience of this book is decision makers, business users, and IT architects and specialists who want to understand and analyze their enterprise content to improve and enhance their business operations. It is also intended as a technical how-to guide for use with the online IBM Knowledge Center for configuring and performing content analytics and enterprise search with Content Analytics.

Table of Contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Summary of changes
    1. July 2014, Third Edition
  5. Chapter 1. Overview of IBM Watson Content Analytics
    1. 1.1 Business need and the content analytics and search solutions
      1. 1.1.1 Business need and problem statement
      2. 1.1.2 The content analytics solution
      3. 1.1.3 The analytics-driven search solution
    2. 1.2 History, changes, and what is new in version 3.0
      1. 1.2.1 Product history
      2. 1.2.2 Product changes
      3. 1.2.3 What is new in IBM Watson Content Analytics
    3. 1.3 Important concepts and terminology
      1. 1.3.1 Unstructured and structured content
      2. 1.3.2 Text analytics
      3. 1.3.3 Search, discovery, and data mining
      4. 1.3.4 Collections
      5. 1.3.5 Facets
      6. 1.3.6 Frequency
      7. 1.3.7 Correlation
      8. 1.3.8 Deviation
    4. 1.4 Content Analytics architecture
      1. 1.4.1 Main components
      2. 1.4.2 Data flow
      3. 1.4.3 Scalability
      4. 1.4.4 Security
  6. Chapter 2. Use case scenarios
    1. 2.1 Customer insights
      1. 2.1.1 Call center
      2. 2.1.2 Quality assurance
    2. 2.2 Law enforcement and public safety
    3. 2.3 Investigation management
      1. 2.3.1 Insurance fraud
    4. 2.4 Healthcare
    5. 2.5 Case management
    6. 2.6 Data warehouse
  7. Chapter 3. Designing content analytics solutions
    1. 3.1 Data considerations
      1. 3.1.1 Content analytics data model
      2. 3.1.2 Structured and unstructured sources
      3. 3.1.3 Multiple data sources
      4. 3.1.4 Date-sensitive data
      5. 3.1.5 Extracting information from textual data
      6. 3.1.6 The number of collections to use
    2. 3.2 Guide for building a content analytics collection
      1. 3.2.1 Building a content analytics collection
      2. 3.2.2 A walk through the building process
      3. 3.2.3 Planning for iteration
    3. 3.3 Programming interfaces
      1. 3.3.1 REST API
      2. 3.3.2 Search and Index API
      3. 3.3.3 Real time natural language processing API
  8. Chapter 4. Understanding content analysis
    1. 4.1 Basic concepts of content analytics
      1. 4.1.1 Manual versus automated analysis
      2. 4.1.2 Frequency versus deviation
      3. 4.1.3 Precision versus recall
    2. 4.2 Typical cycle of analysis with Content Analytics
      1. 4.2.1 Setting the objectives of the analysis
      2. 4.2.2 Gathering data
      3. 4.2.3 Analyzing data
      4. 4.2.4 Taking action based on the analysis
      5. 4.2.5 Validating the effect
    3. 4.3 Successful use cases
      1. 4.3.1 Voice of the customer
      2. 4.3.2 Analysis of other data
    4. 4.4 Summary
  9. Chapter 5. Content analytics miner: Basic features
    1. 5.1 Overview of the content analytics miner
      1. 5.1.1 Accessing the content analytics miner
      2. 5.1.2 Application window layout and functional overview
      3. 5.1.3 Selecting a collection for analysis
      4. 5.1.4 Changing the default behavior by using preferences
    2. 5.2 Search and discovery features
      1. 5.2.1 Limiting the scope of your analysis using facets
      2. 5.2.2 Limiting the scope of your analysis using search operators
      3. 5.2.3 Limiting the scope of your analysis using dates
      4. 5.2.4 Query syntax
      5. 5.2.5 Type ahead
      6. 5.2.6 Saved searches
      7. 5.2.7 Advanced search
    3. 5.3 Query Tree
      1. 5.3.1 Accessing the Query Tree
      2. 5.3.2 Understanding the Query Tree
      3. 5.3.3 Query Tree examples
      4. 5.3.4 Editing the Query Tree
    4. 5.4 Query builder
      1. 5.4.1 Accessing the Query Builder
      2. 5.4.2 Features of the Query Builder window
      3. 5.4.3 Using the Query Builder
      4. 5.4.4 Preferred practice for using the Query Builder and Query Tree
    5. 5.5 Rule-based categories with a query
      1. 5.5.1 Enabling the rule-based categories feature
      2. 5.5.2 Configuring rules for rule-based categories
      3. 5.5.3 Adding the current query as a category rule
    6. 5.6 Common view features
    7. 5.7 Document flagging
      1. 5.7.1 Configuring document flags
      2. 5.7.2 Setting document flags
      3. 5.7.3 Viewing the document values of a flag facet
  10. Chapter 6. Content analytics miner: Views
    1. 6.1 Views
    2. 6.2 Documents view
      1. 6.2.1 Understanding the Documents view
      2. 6.2.2 Viewing the document contents and facets
      3. 6.2.3 When to use the Documents view
    3. 6.3 Facets view
      1. 6.3.1 Understanding the Facets view
      2. 6.3.2 When to use the Facets view
    4. 6.4 Time Series view
      1. 6.4.1 Features in the Time Series view
      2. 6.4.2 Understanding the Time Series view
      3. 6.4.3 When to use the Time Series view
    5. 6.5 Trends view
      1. 6.5.1 Features in the Trends view
      2. 6.5.2 Sort criteria
      3. 6.5.3 Understanding the Trends view
      4. 6.5.4 When to use the Trends view
    6. 6.6 Deviations view
      1. 6.6.1 Features in the Deviations view
      2. 6.6.2 Understanding the Deviations view
      3. 6.6.3 When to use the Deviations view
    7. 6.7 Facet Pairs view
      1. 6.7.1 Table view
      2. 6.7.2 Grid view
      3. 6.7.3 Bird’s eye view
      4. 6.7.4 Understanding the Facet Pairs view with correlation values
      5. 6.7.5 When to use the Facet Pairs view
    8. 6.8 Connections view
      1. 6.8.1 Features in the Connections view
      2. 6.8.2 Understanding the Connections view
      3. 6.8.3 When to use the Connections view
    9. 6.9 Dashboard view
      1. 6.9.1 Configuring the Dashboard layout
      2. 6.9.2 Viewing the Dashboard
      3. 6.9.3 Working with the Dashboard
      4. 6.9.4 Saving Dashboard charts as images
    10. 6.10 Sentiment view
      1. 6.10.1 Document view with Sentiment Analysis enabled
      2. 6.10.2 Understanding the Sentiment view
  11. Chapter 7. Performing content analysis
    1. 7.1 Working with the sample collection
      1. 7.1.1 The sample data
      2. 7.1.2 Getting insights from the sample collection
      3. 7.1.3 Considerations about what you want to discover from the data
    2. 7.2 Content analysis scenarios for the sample collection
      1. 7.2.1 Scenario 1: Using a custom dictionary to discover package-related calls
      2. 7.2.2 Scenario 2: Using custom text analysis rules to discover trouble-related calls
      3. 7.2.3 Scenario 3: Discovering the cause of increasing calls
      4. 7.2.4 Conclusion
    3. 7.3 Overview of techniques to create facets for analysis
      1. 7.3.1 Named Entity Extraction component
      2. 7.3.2 Sentiment
      3. 7.3.3 Terms of interest
      4. 7.3.4 Custom dictionaries
      5. 7.3.5 Facet ranges
      6. 7.3.6 Field Filters
      7. 7.3.7 Rule-based categories
      8. 7.3.8 Syntax Pattern Rules
      9. 7.3.9 Document clustering and classification
    4. 7.4 Preferred practices
  12. Chapter 8. Performing content analysis with built-in annotators
    1. 8.1 Terms of interest
      1. 8.1.1 Basic algorithm for identifying terms of interest
      2. 8.1.2 Limitations in using automatic identification of terms of interest
      3. 8.1.3 Preferred use of terms of interest identified automatically
    2. 8.2 Configuring dictionary-driven analytics
      1. 8.2.1 Multiple viewpoints for analyzing the same data
      2. 8.2.2 Configuring the Dictionary Lookup annotator
      3. 8.2.3 When to use the Dictionary Lookup annotator
      4. 8.2.4 Configuring custom user dictionaries
      5. 8.2.5 Validation and maintenance
    3. 8.3 Configuring the Pattern Matcher annotator
      1. 8.3.1 When to use the Pattern Matcher annotator
      2. 8.3.2 Configuring custom text analysis rules
      3. 8.3.3 Designing the custom text analysis rules
      4. 8.3.4 Validation and maintenance
  13. Chapter 9. Content analysis with IBM Content Classification and document clustering
    1. 9.1 The Content Classification annotator
      1. 9.1.1 When to use the Content Classification annotator
      2. 9.1.2 The Content Classification technology
    2. 9.2 Fine-tuning your analysis with the Content Classification annotator
      1. 9.2.1 Building your collection
      2. 9.2.2 Refining the analysis
      3. 9.2.3 Using a conceptual search for advanced content discovery
    3. 9.3 Creating and deploying the Content Classification resource
      1. 9.3.1 Starting the Content Classification server
      2. 9.3.2 Creating and training the knowledge bases
      3. 9.3.3 Creating a decision plan
      4. 9.3.4 Deploying the knowledge base and decision plan
      5. 9.3.5 Configuring the Content Classification annotator
    4. 9.4 Validation and maintenance of the Content Classification annotator
      1. 9.4.1 Using the Content Classification sample programs
      2. 9.4.2 Content Classification annotator validation techniques
    5. 9.5 Preferred practices for Content Classification annotator usage
    6. 9.6 Document clustering
      1. 9.6.1 Setting up document cluster
      2. 9.6.2 Creating a cluster proposal
      3. 9.6.3 Refining the cluster results
      4. 9.6.4 Deploying clusters to a category
      5. 9.6.5 Working with the cluster results
      6. 9.6.6 Creating and deploying the clustering resource
      7. 9.6.7 Preferred practices
  14. Chapter 10. Importing CSV files, exporting data, and performing deep inspection
    1. 10.1 Importing CSV files
    2. 10.2 Overview of exporting documents and data
      1. 10.2.1 Crawled documents
      2. 10.2.2 Analyzed documents
      3. 10.2.3 Search result documents
      4. 10.2.4 Exported data manifest
    3. 10.3 Location and format of the exported data
      1. 10.3.1 Location of the exported data
      2. 10.3.2 Metadata format
      3. 10.3.3 Binary content format
      4. 10.3.4 Common Analysis Structure format
      5. 10.3.5 Extracted text format
    4. 10.4 Common configuration of the export feature
      1. 10.4.1 Document URI pattern
      2. 10.4.2 Exporting XML attributes and preserving file extensions
      3. 10.4.3 Adding exported documents to the index
      4. 10.4.4 Exporting information about deleted documents
      5. 10.4.5 Scheduling
    5. 10.5 Monitoring export requests
    6. 10.6 Enabling export and sample configurations
      1. 10.6.1 Exporting crawled documents to a file system for IBM Content Collector
      2. 10.6.2 Exporting analyzed documents to a relational database
      3. 10.6.3 Exporting search result documents to the file system for IBM Content Classification
      4. 10.6.4 Exporting search result documents to CSV files
    7. 10.7 Deep inspection
      1. 10.7.1 Location and format of the exported data
      2. 10.7.2 Common configuration
      3. 10.7.3 Enabling deep inspection
      4. 10.7.4 Generating deep inspection reports
      5. 10.7.5 Optional: Scheduling a deep inspection run
      6. 10.7.6 Monitoring the deep inspection requests
      7. 10.7.7 Validating the deep inspection reports generation
    8. 10.8 Creating and deploying a custom plug-in
  15. Chapter 11. Customizing content analytics with IBM Content Analytics Studio
    1. 11.1 ICA Studio overview
    2. 11.2 The building process of UIMA pipeline
    3. 11.3 Use case: Building a UIMA pipeline for analyzing customers complaints
      1. 11.3.1 Creating the ICA Studio project
      2. 11.3.2 Creating the UIMA pipeline
      3. 11.3.3 Configuring the basic UIMA pipeline
      4. 11.3.4 Testing the UIMA pipeline and reviewing output
      5. 11.3.5 Creating custom dictionaries for Lexical Analysis
      6. 11.3.6 Creating parsing rules
    4. 11.4 Exporting annotators
      1. 11.4.1 Creating the IBM Brands annotator
      2. 11.4.2 Exporting the annotator
    5. 11.5 Conclusion
  16. Chapter 12. Enterprise search
    1. 12.1 Overview of enterprise search capability in Content Analytics
      1. 12.1.1 Adding values with Content Analytics features
      2. 12.1.2 Enterprise search application user interface
      3. 12.1.3 Components supporting Content Analytics enterprise search capability
      4. 12.1.4 REST Search API
    2. 12.2 Use case overview
    3. 12.3 Customizing crawling, parsing, and indexing
      1. 12.3.1 Setting up multiple content sources for crawling
      2. 12.3.2 Mapping multiple content sources to the index
      3. 12.3.3 Adding additional facets and fields with a custom annotator
      4. 12.3.4 Adding a category tree
      5. 12.3.5 Altering field values to conform to uniform standard
    4. 12.4 Customizing runtime search
      1. 12.4.1 Tuning queries and results
      2. 12.4.2 Enhancing free text search to use field search
      3. 12.4.3 Expanding the search query
      4. 12.4.4 Grouping results
      5. 12.4.5 Ranking the search results
    5. 12.5 Search Customizer
      1. 12.5.1 Adding the Person facet
      2. 12.5.2 Adding the timeline
      3. 12.5.3 Adding the Category Tree
    6. 12.6 Performing search
      1. 12.6.1 Search strategies
      2. 12.6.2 Search example
    7. 12.7 Security
      1. 12.7.1 Authentication
      2. 12.7.2 Authorization (access control)
    8. 12.8 Summary
  17. Chapter 13. Adding value to Cognos Business Intelligence
    1. 13.1 Integration overview
    2. 13.2 Integration architecture
      1. 13.2.1 Data model integration
      2. 13.2.2 Cognos report generation
    3. 13.3 Initial setup
      1. 13.3.1 Verifying IBM Cognos BI
      2. 13.3.2 Creating a data source connection by using Cognos BI Administration
      3. 13.3.3 Configuring default application user roles
      4. 13.3.4 Configuring an export to a relational database using Content Analytics
      5. 13.3.5 Configuring the Cognos BI server for reporting by using Content Analytics
    4. 13.4 Generating Cognos BI reports
    5. 13.5 Creating custom Cognos BI reports
      1. 13.5.1 Exporting search results
      2. 13.5.2 Loading the exported data model into Cognos
  18. Chapter 14. Customizing and extending the content analytics miner
    1. 14.1 Reasons for custom development
    2. 14.2 Analytics Customizer
    3. 14.3 Creating the sample plug-in: Spatial Analysis
      1. 14.3.1 Preparation
      2. 14.3.2 Plug-in structure
      3. 14.3.3 Adding a map to the plug-in
      4. 14.3.4 Displaying documents on the map
      5. 14.3.5 The entire code for the Spatial Analysis plug-in so far
      6. 14.3.6 Adding selection mode
  19. Appendix A. Spatial Analysis plug-in code
    1. Spatial Analysis plug-in overview
    2. plugin.js
    3. plugin.html
    4. style.css
  20. Appendix B. Additional material
    1. Locating the Web material
    2. Using the Web material
  21. Related publications
    1. IBM Redbooks
    2. Other publications
    3. Online resources
    4. Help from IBM
  22. Back cover
  23. IBM System x Reference Architecture for Hadoop: IBM InfoSphere BigInsights Reference Architecture
    1. Introduction
    2. Business problem and business value
    3. Reference architecture use
    4. Requirements
    5. InfoSphere BigInsights predefined configuration
    6. InfoSphere BigInsights HBase predefined configuration
    7. Deployment considerations
    8. Customizing the predefined configurations
    9. Predefined configuration bill of materials
    10. References
    11. The team who wrote this paper
    12. Now you can become a published author, too!
    13. Stay connected to IBM Redbooks
  24. Notices
    1. Trademarks