Designing and Operating a Data Reservoir

Book description

Together, big data and analytics have tremendous potential to improve the way we use precious resources, to provide more personalized services, and to protect ourselves from unexpected and ill-intentioned activities. To fully use big data and analytics, an organization needs a system of insight. This is an ecosystem where individuals can locate and access data, and build visualizations and new analytical models that can be deployed into the IT systems to improve the operations of the organization. The data that is most valuable for analytics is also valuable in its own right and typically contains personal and private information about key people in the organization such as customers, employees, and suppliers.

Although universal access to data is desirable, safeguards are necessary to protect people's privacy, prevent data leakage, and detect suspicious activity.

The data reservoir is a reference architecture that balances the desire for easy access to data with information governance and security. The data reservoir reference architecture describes the technical capabilities necessary for a system of insight, while being independent of specific technologies. Being technology independent is important, because most organizations already have investments in data platforms that they want to incorporate in their solution. In addition, technology is continually improving, and the choice of technology is often dictated by the volume, variety, and velocity of the data being managed.

A system of insight needs more than technology to succeed. The data reservoir reference architecture includes description of governance and management processes and definitions to ensure the human and business systems around the technology support a collaborative, self-service, and safe environment for data use.

The data reservoir reference architecture was first introduced in Governing and Managing Big Data for Analytics and Decision Makers, REDP-5120, which is available at:
http://www.redbooks.ibm.com/redpieces/abstracts/redp5120.html.

This IBM® Redbooks publication, Designing and Operating a Data Reservoir, builds on that material to provide more detail on the capabilities and internal workings of a data reservoir.

Table of contents

  1. Front cover
  2. IBM Redbooks promotions
  3. Notices
    1. Trademarks
  4. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  5. Chapter 1. Introduction to big data and analytics
    1. 1.1 Data is key to success
    2. 1.2 About this publication
    3. 1.3 Case study: Eightbar Pharmaceuticals
      1. 1.3.1 Introducing Erin Overview
      2. 1.3.2 Perspectives from the business users at EbP
      3. 1.3.3 Signs of deep change
      4. 1.3.4 Governance and compliance perspectives
      5. 1.3.5 Positioning the data reservoir in the enterprise architecture
      6. 1.3.6 The data reservoir
      7. 1.3.7 Inside the data reservoir
      8. 1.3.8 Initial mapping of the data reservoir architecture
      9. 1.3.9 Additional use cases enabled by a data reservoir
      10. 1.3.10 Security for the data reservoir
      11. 1.3.11 What does IBM security technology do?
    4. 1.4 Summary and next steps
  6. Chapter 2. Defining the data reservoir ecosystem
    1. 2.1 How does the data reservoir support the business?
      1. 2.1.1 Extended data warehouse
      2. 2.1.2 Self-service information library
      3. 2.1.3 Shared analytics
      4. 2.1.4 Tailored consumption
      5. 2.1.5 Confident use
    2. 2.2 Process tools and lifecycles
      1. 2.2.1 The need for self-service
      2. 2.2.2 Facets of self-service
      3. 2.2.3 Enablers of self-service
      4. 2.2.4 Workflow for self-service
      5. 2.2.5 Catalog management for self-service
    3. 2.3 Defining the information governance program
      1. 2.3.1 Core elements of the governance program
      2. 2.3.2 Information governance principles
      3. 2.3.3 Classification schemes
      4. 2.3.4 Governance Rules
      5. 2.3.5 Business terminology glossary
      6. 2.3.6 EbP starts its governance program
      7. 2.3.7 Automating Curation Tasks
      8. 2.3.8 Policies for administering the reservoir
    4. 2.4 Creating a culture that gets value from a data reservoir
      1. 2.4.1 Reservoir as a vital daily tool
      2. 2.4.2 Reassuring information suppliers
    5. 2.5 Setting limits on the use of information
      1. 2.5.1 Controlling information access
      2. 2.5.2 Auditing and fraud prevention
      3. 2.5.3 Ethical use
      4. 2.5.4 Crossing national and jurisdictional boundaries
    6. 2.6 Conclusions
  7. Chapter 3. Logical Architecture
    1. 3.1 The data reservoir from outside
      1. 3.1.1 Other data reservoirs
      2. 3.1.2 Information sources
      3. 3.1.3 Analytics Tools
      4. 3.1.4 Information curator
      5. 3.1.5 Governance, risk, and compliance team
      6. 3.1.6 Line-of-business applications
      7. 3.1.7 Data reservoir operations
    2. 3.2 Overview of the data reservoir details
    3. 3.3 Data reservoir repositories
      1. 3.3.1 Historical data
      2. 3.3.2 Harvested data
      3. 3.3.3 Deposited data
      4. 3.3.4 Shared operational data
      5. 3.3.5 Descriptive data
    4. 3.4 Information integration and governance
      1. 3.4.1 Enterprise IT interaction
      2. 3.4.2 Raw data interaction
      3. 3.4.3 Catalog interfaces
      4. 3.4.4 View-based interaction
    5. 3.5 Component interactions
      1. 3.5.1 Feeding data into the reservoir
      2. 3.5.2 Publishing feeds from the reservoir
      3. 3.5.3 Information integration and governance
    6. 3.6 Summary
  8. Chapter 4. Developing information supply chains for the data reservoir
    1. 4.1 The information supply chain pattern
    2. 4.2 Standard information supply chains in the data reservoir
      1. 4.2.1 Information supply chains for data from enterprise IT systems
      2. 4.2.2 Information supply chain for descriptive data
      3. 4.2.3 Information supply chain for auditing the data reservoir
      4. 4.2.4 Information supply chain for deposited data
    3. 4.3 Implementing information supply chains in the data reservoir
      1. 4.3.1 Erin's perspective
      2. 4.3.2 Deciding on the subject areas that the data reservoir needs to support
      3. 4.3.3 Information sources: The beginning of the information supply chain
      4. 4.3.4 Position of data repositories in the information supply chain
      5. 4.3.5 Information supply chain triggers
      6. 4.3.6 Creating data refineries
      7. 4.3.7 Information virtualization
      8. 4.3.8 Service interfaces
      9. 4.3.9 Using information zones to identify where to store data in the data reservoir repositories
    4. 4.4 Summary
  9. Chapter 5. Operating the data reservoir
    1. 5.1 Reservoir operations
    2. 5.2 Operational components
    3. 5.3 Operational workflow for the reservoir
      1. 5.3.1 Share
      2. 5.3.2 Govern
      3. 5.3.3 Use
    4. 5.4 Workflow roles
      1. 5.4.1 Workflow author
      2. 5.4.2 Workflow initiator
      3. 5.4.3 Workflow executor
      4. 5.4.4 Workflow owner
    5. 5.5 Workflow lifecycle
    6. 5.6 Types of workflow
      1. 5.6.1 Data quality management
      2. 5.6.2 Data curation
      3. 5.6.3 Data protection
      4. 5.6.4 Lifecycle management
      5. 5.6.5 Data movement and orchestration
    7. 5.7 Self service through workflow
      1. 5.7.1 The evolution of the data steward
    8. 5.8 Information governance policies
    9. 5.9 Governance rules
    10. 5.10 Monitoring and reporting
      1. 5.10.1 Policy monitoring
      2. 5.10.2 Workflow monitoring
      3. 5.10.3 People monitoring
      4. 5.10.4 Reporting
      5. 5.10.5 Audit
      6. 5.10.6 Iterative improvement
    11. 5.11 Collaboration
      1. 5.11.1 Instant collaboration
      2. 5.11.2 Expertise location
      3. 5.11.3 Notifications
      4. 5.11.4 Gamification in curation
    12. 5.12 Business user interfaces including mobile access
    13. 5.13 Reporting dashboards
      1. 5.13.1 Catalog interface
      2. 5.13.2 Mobile access
      3. 5.13.3 Summary
  10. Chapter 6. Roadmaps for the data reservoir
    1. 6.1 Establishing the data reservoir foundation
      1. 6.1.1 Deploy the integration and governance fabric
      2. 6.1.2 Setting up the governance program
      3. 6.1.3 Adding a data repository
      4. 6.1.4 Adding an information source
      5. 6.1.5 Provisioning data from an information source
      6. 6.1.6 Enabling an information view
    2. 6.2 Data warehouse augmentation use case
      1. 6.2.1 Adding the data reservoir around the data warehouse
      2. 6.2.2 Working with new data
      3. 6.2.3 Enabling business access to new insight
    3. 6.3 Operational data for systems of engagement use case
      1. 6.3.1 Adding the data reservoir around the shared operational data
      2. 6.3.2 Adding the object cache
    4. 6.4 360 degree view of customer use case
      1. 6.4.1 Adding new data reservoir repositories
      2. 6.4.2 Adding new data from additional information sources
    5. 6.5 Self-service data use case
      1. 6.5.1 Self-managed data
      2. 6.5.2 Adding enterprise data to the data reservoir
      3. 6.5.3 Giving access to business users
    6. 6.6 Data distribution use case
    7. 6.7 Summary
  11. Chapter 7. Technology Choices
    1. 7.1 Technology for the data repositories
    2. 7.2 Technology for the integration and governance fabric
    3. 7.3 Technology for the raw data interaction
    4. 7.4 Technology for the catalog
    5. 7.5 Technology for the view-based interaction subsystem
    6. 7.6 Technology for the continuous analytics subsystem
    7. 7.7 Summary
  12. Chapter 8. Conclusions and summary
    1. 8.1 Summary of the data reservoir reference architecture
    2. 8.2 Further reading
  13. Related publications
    1. IBM Redbooks
    2. Other publications
    3. Online resources
    4. Help from IBM
  14. Back cover

Product information

  • Title: Designing and Operating a Data Reservoir
  • Author(s): Mandy Chessell, Nigel L Jones, Jay Limburn, David Radley, Kevin Shank
  • Release date: May 2015
  • Publisher(s): IBM Redbooks
  • ISBN: 9780837440668