You are previewing Creating Value-Based Archiving Solutions with IBM Content Collector.
O'Reilly logo
Creating Value-Based Archiving Solutions with IBM Content Collector

Book Description

This IBM® Redbooks® publication describes how the IBM Content Collector family of products can help companies to create value-based archiving solutions. IBM Content Collector provides enterprise-wide content archiving and retention management capabilities. It also provides IT administrators with a high level of control over the archiving environment. From a common interface, organizations can implement policies that define what gets archived from which source system, make decisions about how content gets archived based on the content or metadata of the information, and determine the retention and governance rules associated with that type of content. Content Collector enables IT staff to implement granular archiving policies to collect and archive specific pieces of information.

IBM Content Collector helps with the following tasks:

  • Eliminating point solutions and lowering costs with a unified collection, management, and governance approach that works effectively across a broad range of source systems and information types

  • Appraising, improving understanding of, culling, and properly selecting the information to archive

  • Retaining, holding, and disposing of archived content efficiently and defensibly

    Eliminating the costs and risks inherent with over-retention This book covers the basic concepts of the IBM Content Collector product family. It presents an overview explaining how it provides value-based archiving and a defensible disposal capability in the archiving solutions. With the integration of IBM Content Classification and IBM Enterprise Records, the book also explains and showcases how these products can be used to add more flexibility, power, and capabilities to archiving solutions. The book is intended for IT architects and solution designers who need to understand and use IBM Content Collector for archiving solution implementations. Use cases are included to provide specific, step-by-step details about implementing common solutions that fulfill some of the general business requirements.

Table of Contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. The team who wrote this book
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Chapter 1. Value-based archiving and defensible disposal overview
    1. 1.1 Business problems
      1. 1.1.1 Introduction
      2. 1.1.2 The growth of archiving
    2. 1.2 IBM solutions
      1. 1.2.1 IBM Information Lifecycle Governance
      2. 1.2.2 Value-based archiving with IBM Content Collector
    3. 1.3 IBM Content Collector overview
      1. 1.3.1 Architectural overview
      2. 1.3.2 IBM Content Collector components
      3. 1.3.3 IBM Content Collector features
    4. 1.4 Typical archiving use cases and scenarios
    5. 1.5 Conclusion
  5. Chapter 2. Example use cases
    1. 2.1 Use case 1: Email archiving for compliance and storage management
    2. 2.2 Use case 2: Email archiving with content classification
    3. 2.3 Use case 3: Email archiving with records declaration
    4. 2.4 Use case 4: File system archiving with records declaration
    5. 2.5 Conclusion
  6. Chapter 3. Dimensions of content archiving themes
    1. 3.1 Dimensions of content archiving with IBM Content Collector
    2. 3.2 Storage management
      1. 3.2.1 Staging rollout for maximizing storage savings
      2. 3.2.2 Document stubbing
      3. 3.2.3 Storage management for file shares
      4. 3.2.4 Storage management for email
      5. 3.2.5 Storage management for Microsoft SharePoint
    3. 3.3 Compliance archiving
      1. 3.3.1 Compliance archiving for email
      2. 3.3.2 Compliance archiving for Microsoft SharePoint
      3. 3.3.3 Compliance archiving for IBM Connections
    4. 3.4 Business process management
      1. 3.4.1 Business process management for file shares
      2. 3.4.2 Business process management for email
    5. 3.5 Use case 1A and 1B: Email archiving for compliance and storage
    6. 3.6 Conclusion
  7. Chapter 4. Designing, adapting, and deploying task routes
    1. 4.1 Dynamically calculating document retention
      1. 4.1.1 Automatically calculating document retention
      2. 4.1.2 Manually setting document retention
    2. 4.2 Adjusting collectors
      1. 4.2.1 Configuring schedules
      2. 4.2.2 Collector and task route filtering
    3. 4.3 Optimizing task routes for maintainability
      1. 4.3.1 Impact of using multiple connections for Microsoft SharePoint
      2. 4.3.2 Microsoft SharePoint version series task route design
      3. 4.3.3 Creation of user-defined metadata
      4. 4.3.4 Impact of using multiple collection sources
      5. 4.3.5 Impact of using multiple collectors
      6. 4.3.6 Impact of using multiple task routes
      7. 4.3.7 Strategies for minimizing the number of task routes
    4. 4.4 Promoting task routes from development to production systems
      1. 4.4.1 Understanding task route dependencies
      2. 4.4.2 Checklist for task route migration
    5. 4.5 Using the Expression Editor
      1. 4.5.1 Avoiding the need for nested decision points
      2. 4.5.2 Using list lookups
    6. 4.6 Extending IBM Content Collector
      1. 4.6.1 Choosing the correct extension strategy for your scenario
      2. 4.6.2 Extending the source system or target system
      3. 4.6.3 Using the Script Connector
      4. 4.6.4 Using the IBM Content Collector Software Development Kit
    7. 4.7 Conclusion
  8. Chapter 5. Retention management
    1. 5.1 Retention management overview
    2. 5.2 Stubbing lifecycle
    3. 5.3 Expiration Manager
      1. 5.3.1 Looking up the expiration date of a document
      2. 5.3.2 Working with eDiscovery and records management solutions
      3. 5.3.3 Running multiple instances of Expiration Manager
      4. 5.3.4 Scheduling Expiration Manager execution
      5. 5.3.5 Optimizing Expiration Manager for performance
    4. 5.4 Expired stub management
      1. 5.4.1 Determine the ID of the repository
      2. 5.4.2 Email
      3. 5.4.3 Microsoft SharePoint
      4. 5.4.4 File system
    5. 5.5 Use case 1C: Lifecycle stubbing and retention management
      1. 5.5.1 Create the stubbing lifecycle task route
      2. 5.5.2 Enable the Expiration Manager
      3. 5.5.3 Create the audit task route (optional)
    6. 5.6 Conclusion
  9. Chapter 6. Document classification
    1. 6.1 The business value of using IBM Content Classification
    2. 6.2 IBM Content Classification overview
    3. 6.3 Basic content classification integration
      1. 6.3.1 Setting up Content Classification with Content Collector
      2. 6.3.2 Configuring task route for automated email archiving example
      3. 6.3.3 A BPM task route example
      4. 6.3.4 Working with Content Collector email client integration
    4. 6.4 Content Classification applied to other scenarios
      1. 6.4.1 Value-based archiving and defensible disposal
      2. 6.4.2 Using Content Classification for record declaration
      3. 6.4.3 Using Content Classification for eDiscovery
    5. 6.5 Using decision plan for value-based archiving and defensible disposal
      1. 6.5.1 Setting expiration date using Content Classification calculation
      2. 6.5.2 Decision plan used in the expiration calculation
      3. 6.5.3 Loading the decision plan for inspection
      4. 6.5.4 Reproducing disposal decisions made in the past
    6. 6.6 Generating new facets with Content Classification
      1. 6.6.1 Generating facets for multiple taxonomies
      2. 6.6.2 Generating facets using wordlists
      3. 6.6.3 Creating facets from regular expressions
      4. 6.6.4 Creating facets with Content Classification user hooks
      5. 6.6.5 Creating facets with the UIMA client hooks
    7. 6.7 Reviewing and auditing archived emails and documents
      1. 6.7.1 Reviewing results of Content Collector’s automatic classification
      2. 6.7.2 Manual audit and feedback
      3. 6.7.3 Deferred feedback
      4. 6.7.4 Pitfalls of feedback
      5. 6.7.5 Using representative datasets
      6. 6.7.6 Review and feedback through the Classification Center
      7. 6.7.7 Inspection and feedback through the Email Client integration
    8. 6.8 Use case 2: Email archiving with content classification
      1. 6.8.1 The decision plan
      2. 6.8.2 The task route
      3. 6.8.3 Upgrading to use case 3
    9. 6.9 Considerations and guidelines
      1. 6.9.1 Preferred practices
      2. 6.9.2 Limitation considerations
    10. 6.10 Conclusion
  10. Chapter 7. Records management integration
    1. 7.1 Options for classifying and declaring records
      1. 7.1.1 Simple retention management versus record declaration
      2. 7.1.2 Determining classification
      3. 7.1.3 Use cases and examples
    2. 7.2 Record declaration requirements
      1. 7.2.1 Prerequisites for record declaration
      2. 7.2.2 Content must be archived before declaration
      3. 7.2.3 Essential information for record declaration
    3. 7.3 Basic record declaration from a Content Collector task route
      1. 7.3.1 Enabling archived documents for declaration
      2. 7.3.2 Content Collector P8 Declare Record task overview
      3. 7.3.3 Configuring record classification - options
      4. 7.3.4 Configuring property mapping - options
      5. 7.3.5 Which options to use
      6. 7.3.6 Determining record classification
      7. 7.3.7 Task route templates for record declaration
    4. 7.4 Use case 3: Email archiving with records declaration
      1. 7.4.1 Deciding which content to declare
      2. 7.4.2 Archiving the content
      3. 7.4.3 Configuring record declaration
      4. 7.4.4 Results in Enterprise Records
      5. 7.4.5 Including logic to handle intermittent failure
    5. 7.5 Use case 4: File system archiving with records declaration
      1. 7.5.1 Scenario and overview
      2. 7.5.2 Deciding which content to declare
      3. 7.5.3 Separating documents for declaration
      4. 7.5.4 Creating the document in P8
      5. 7.5.5 Configuring record declaration
      6. 7.5.6 Post processing
      7. 7.5.7 Results in Enterprise Records
    6. 7.6 Considerations and guidelines
      1. 7.6.1 Preferred practices
      2. 7.6.2 Limitations
      3. 7.6.3 Declaring a version series
      4. 7.6.4 Using deduplication for files
      5. 7.6.5 Considerations for email
      6. 7.6.6 Considerations for large volumes
      7. 7.6.7 Declaring records after content has been archived
    7. 7.7 Conclusion
  11. Chapter 8. IBM Connections integration
    1. 8.1 Configuring IBM Connections for IBM Content Collector
      1. 8.1.1 Setting up user permissions
      2. 8.1.2 Scale out and backup considerations
    2. 8.2 Archiving a subset of content
      1. 8.2.1 Content filtering for IBM Connections
      2. 8.2.2 Archiving past content from a specific person
    3. 8.3 Configuring eDiscovery Manager for IBM Connections
      1. 8.3.1 Viewing IBM Connections documents in eDiscovery Manager
      2. 8.3.2 Extending searching capabilities
      3. 8.3.3 Considerations for eDiscovery Manager document export
      4. 8.3.4 Conclusion
  12. Related publications
    1. IBM Redbooks
    2. Online resources
    3. Help from IBM
  13. Back cover