You are previewing Metadata Management with IBM InfoSphere Information Server.
O'Reilly logo
Metadata Management with IBM InfoSphere Information Server

Book Description

What do you know about your data? And how do you know what you know about your data?

Information governance initiatives address corporate concerns about the quality and reliability of information in planning and decision-making processes. Metadata management refers to the tools, processes, and environment that are provided so that organizations can reliably and easily share, locate, and retrieve information from these systems.

Enterprise-wide information integration projects integrate data from these systems to one location to generate required reports and analysis.
During this type of implementation process, metadata management must be provided along each step to ensure that the final reports and analysis are from the right data sources, are complete, and have quality.

This IBM® Redbooks® publication introduces the information governance initiative and highlights the immediate needs for metadata management. It explains how IBM InfoSphere™ Information Server provides a single unified platform and a collection of product modules and components so that organizations can understand, cleanse, transform, and deliver trustworthy and context-rich information. It describes a typical implementation process. It explains how InfoSphere Information Server provides the functions that are required to implement such a solution and, more importantly, to achieve metadata management.

This book is for business leaders and IT architects with an overview of metadata management in information integration solution space. It also provides key technical details that IT professionals can use in a solution planning, design, and implementation process.

Table of Contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. The team who wrote this book
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Part 1 Overview and concepts
  5. Chapter 1. Information governance and metadata management
    1. 1.1 Information governance
    2. 1.2 Defining metadata
    3. 1.3 Types of metadata
      1. 1.3.1 Business metadata
      2. 1.3.2 Technical metadata
      3. 1.3.3 Operational metadata
    4. 1.4 Why metadata is important
      1. 1.4.1 Risk avoidance
      2. 1.4.2 Regulatory compliance
      3. 1.4.3 IT productivity
    5. 1.5 Requirements for managing metadata
      1. 1.5.1 The information governance organization
      2. 1.5.2 Information governance operational teams
      3. 1.5.3 Standards, policies, and procedures
    6. 1.6 Business scenarios for metadata management
      1. 1.6.1 Metadata for compliance
      2. 1.6.2 Metadata for risk management
    7. 1.7 Where to start
    8. 1.8 Conclusion
  6. Chapter 2. Solution planning and metadata management
    1. 2.1 Getting started with solution planning
      1. 2.1.1 Information integration solution
      2. 2.1.2 Information integration project
    2. 2.2 Stakeholders
    3. 2.3 Integrated solution data flow
    4. 2.4 Typical implementation process flow
      1. 2.4.1 Defining business requirements
      2. 2.4.2 Building business centric vocabulary
      3. 2.4.3 Developing data model
      4. 2.4.4 Documenting source data
      5. 2.4.5 Assessing and monitoring data quality
      6. 2.4.6 Building up the metadata repository
      7. 2.4.7 Transforming data
      8. 2.4.8 Developing BI solutions
      9. 2.4.9 Generating enterprise reports and lineage
    5. 2.5 Conclusion
  7. Chapter 3. IBM InfoSphere Information Server approach
    1. 3.1 Overview of InfoSphere Information Server
    2. 3.2 Platform infrastructure
      1. 3.2.1 Services tier
      2. 3.2.2 Engine tier
      3. 3.2.3 Repository tier
    3. 3.3 Product modules and components
      1. 3.3.1 InfoSphere Blueprint Director
      2. 3.3.2 InfoSphere DataStage and InfoSphere QualityStage
      3. 3.3.3 InfoSphere Information Analyzer
      4. 3.3.4 InfoSphere Discovery
      5. 3.3.5 InfoSphere FastTrack
      6. 3.3.6 InfoSphere Business Glossary
      7. 3.3.7 InfoSphere Metadata Workbench
      8. 3.3.8 InfoSphere Information Server Manager, ISTools and InfoSphere Metadata Asset Manager
      9. 3.3.9 InfoSphere Data Architect
      10. 3.3.10 Cognos Business Intelligence software
    4. 3.4 Solution development: Mapping product modules and components to solution processes
      1. 3.4.1 Defining the business requirements
      2. 3.4.2 Building business centric vocabulary
      3. 3.4.3 Developing data model
      4. 3.4.4 Documenting source data
      5. 3.4.5 Assessing and monitoring data quality
      6. 3.4.6 Building up the metadata repository
      7. 3.4.7 Transforming data
      8. 3.4.8 Developing BI solutions
      9. 3.4.9 Generating enterprise reports and lineage
    5. 3.5 Deployment architecture and topologies
      1. 3.5.1 Overview of the topologies
      2. 3.5.2 Unified and shared metadata
      3. 3.5.3 Metadata portability
      4. 3.5.4 Alternative deployment
    6. 3.6 Conclusion
  8. Part 2 Implementation
  9. Chapter 4. Use-case scenario
    1. 4.1 Scenario background
    2. 4.2 Current BI and data warehouse solution for Bank A
    3. 4.3 Project goals for the new solution
    4. 4.4 Using IBM InfoSphere Information Server for the new solution
      1. 4.4.1 Changes required
    5. 4.5 Additional challenges
      1. 4.5.1 The integration challenge and the governance problem
      2. 4.5.2 Additional business requirements
    6. 4.6 A customized plan
      1. 4.6.1 BI process development
      2. 4.6.2 Data Quality monitoring and subscription
      3. 4.6.3 Data lineage and reporting requirements and capabilities
    7. 4.7 Conclusion
  10. Chapter 5. Implementation planning
    1. 5.1 Introduction to InfoSphere Blueprint Director
    2. 5.2 InfoSphere Blueprint Director user interface basics
      1. 5.2.1 User interface
      2. 5.2.2 Palette
    3. 5.3 Creating a blueprint by using a template
      1. 5.3.1 Customizing the template
      2. 5.3.2 Working with metadata repository
      3. 5.3.3 Working with a business glossary
    4. 5.4 Working with milestones
    5. 5.5 Using methodology
    6. 5.6 Conclusion
  11. Chapter 6. Building a business-centric vocabulary
    1. 6.1 Introduction to InfoSphere Business Glossary
    2. 6.2 Business glossary and information governance
    3. 6.3 Creating the business glossary content
      1. 6.3.1 Taxonomy
      2. 6.3.2 The taxonomy development process
      3. 6.3.3 Controlled vocabulary
      4. 6.3.4 Term specification process and guidelines
      5. 6.3.5 Using external glossary sources
      6. 6.3.6 The vocabulary authoring process
    4. 6.4 Deploying a business glossary
      1. 6.4.1 InfoSphere Business Glossary environment
    5. 6.5 Managing the term authoring process with a workflow
      1. 6.5.1 Loading and populating the glossary
      2. 6.5.2 Creating and editing a term
      3. 6.5.3 Adding term relations and assigning assets
      4. 6.5.4 Reference by category
      5. 6.5.5 Custom attributes
      6. 6.5.6 Labels
      7. 6.5.7 Stewardship
      8. 6.5.8 URL links
      9. 6.5.9 Import glossary
    6. 6.6 Searching and exploring with InfoSphere Business Glossary
    7. 6.7 Multiple ways of accessing InfoSphere Business Glossary
      1. 6.7.1 InfoSphere Business Glossary Anywhere
      2. 6.7.2 REST API
      3. 6.7.3 Eclipse plug-in
    8. 6.8 Conclusion
  12. Chapter 7. Source documentation
    1. 7.1 Process overview
    2. 7.2 Introduction to InfoSphere Metadata Asset Manager
    3. 7.3 Application systems
      1. 7.3.1 Extended data source types
      2. 7.3.2 Format
      3. 7.3.3 Loading the application system
      4. 7.3.4 Results
    4. 7.4 Sequential files
      1. 7.4.1 Loading a data file
      2. 7.4.2 Results
    5. 7.5 Staging database
      1. 7.5.1 Loading the staging database
      2. 7.5.2 Results
    6. 7.6 Data extraction
      1. 7.6.1 Input file format
      2. 7.6.2 Documenting the data extraction
      3. 7.6.3 Results
    7. 7.7 Conclusion
  13. Chapter 8. Data relationship discovery
    1. 8.1 Introduction to InfoSphere Discovery
      1. 8.1.1 Planning equals saving
      2. 8.1.2 A step-by-step discovery guide
    2. 8.2 Creating a project
      1. 8.2.1 Pointing to the data requiring analysis
      2. 8.2.2 Importing the source data
      3. 8.2.3 Importing the target data
    3. 8.3 Performing column analysis
      1. 8.3.1 Monitoring tasks with the activity viewer
      2. 8.3.2 Reviewing the column analysis results
      3. 8.3.3 Metadata and statistical results
      4. 8.3.4 Value, pattern, and length frequencies
    4. 8.4 Identifying and classifying sensitive data
      1. 8.4.1 Column classification view
      2. 8.4.2 Displaying hits for classification columns
      3. 8.4.3 Column classification algorithms
    5. 8.5 Assigning InfoSphere Business Glossary terms to physical assets
      1. 8.5.1 Importing, mapping, and exporting term assignments
      2. 8.5.2 Mapping business glossary terms to physical assets
    6. 8.6 Reverse engineering a data model
      1. 8.6.1 Primary-foreign key candidates
      2. 8.6.2 Discovering primary-foreign key candidates
      3. 8.6.3 Displaying the results
      4. 8.6.4 Data objects
      5. 8.6.5 Performing transformation discovery
    7. 8.7 Performing value overlap analysis
      1. 8.7.1 Running overlap analysis
      2. 8.7.2 Column Summary
      3. 8.7.3 Viewing value overlap details
    8. 8.8 Discovering transformation logic
      1. 8.8.1 Performing a transformation discovery
      2. 8.8.2 Reviewing maps
      3. 8.8.3 Exporting transformation results to InfoSphere FastTrack
    9. 8.9 Conclusion
  14. Chapter 9. Data quality assessment and monitoring
    1. 9.1 Introduction to IBM InfoSphere Information Analyzer
      1. 9.1.1 InfoSphere Information Analyzer and information governance
      2. 9.1.2 InfoSphere Information Analyzer and InfoSphere Information Server
      3. 9.1.3 Metadata data repository
    2. 9.2 InfoSphere Information Analyzer data rules
      1. 9.2.1 Roles in data rules and data quality management
      2. 9.2.2 Properties of the InfoSphere Information Analyzer data rules
      3. 9.2.3 Data rules management
      4. 9.2.4 Rule definition guidelines for data quality
    3. 9.3 Creating a rule
      1. 9.3.1 Creating a rule definition
      2. 9.3.2 Testing a rule
      3. 9.3.3 Generating data rules
    4. 9.4 Data rule examples
      1. 9.4.1 Checking for duplicates
      2. 9.4.2 Generating a data rule
      3. 9.4.3 Use case: Creating a data rule to monitor high value customers
      4. 9.4.4 Creating rules to monitor gold customers
    5. 9.5 Data rules and performance consideration
      1. 9.5.1 Types of data rules
      2. 9.5.2 Using join tables in data quality rules
      3. 9.5.3 Cartesian products and how to avoid them
      4. 9.5.4 Applying filtering in data quality rules
      5. 9.5.5 Filtering versus sampling
      6. 9.5.6 Virtual tables versus database views
      7. 9.5.7 Global variables
    6. 9.6 Rule sets
    7. 9.7 Metrics
    8. 9.8 Monitoring data quality
    9. 9.9 Using HTTP/CLI API
    10. 9.10 Managing rules
    11. 9.11 Deploying rules, rule sets, and metrics
    12. 9.12 Rule stage for InfoSphere DataStage
    13. 9.13 Conclusion
  15. Chapter 10. Building up the metadata repository
    1. 10.1 Introduction to InfoSphere Metadata Workbench
    2. 10.2 Data storage systems
    3. 10.3 Data models
      1. 10.3.1 Loading the data models
      2. 10.3.2 Results
    4. 10.4 Business intelligence reports
      1. 10.4.1 Loading BI reports
      2. 10.4.2 Results
    5. 10.5 Information asset enrichment
      1. 10.5.1 Business glossary terms
      2. 10.5.2 Business glossary labels
      3. 10.5.3 Data stewardship
      4. 10.5.4 Asset descriptor and alias
    6. 10.6 Conclusion
  16. Chapter 11. Data transformation
    1. 11.1 Introduction to InfoSphere FastTrack
      1. 11.1.1 Functionality and user interface
      2. 11.1.2 Administration
    2. 11.2 Basic mapping
    3. 11.3 Advanced mapping
    4. 11.4 Mapping lifecycle management (job generation)
    5. 11.5 Metadata sharing (extension mappings)
    6. 11.6 InfoSphere DataStage job design
      1. 11.6.1 Job design details
    7. 11.7 Shared metadata
      1. 11.7.1 Shared metadata that must be created
    8. 11.8 Operational metadata
      1. 11.8.1 Creating operational metadata
    9. 11.9 Conclusion
  17. Chapter 12. Enterprise reports and lineage generation
    1. 12.1 Lineage administration
      1. 12.1.1 Business lineage
      2. 12.1.2 Data lineage
      3. 12.1.3 Impact analysis
    2. 12.2 Support for InfoSphere DataStage and InfoSphere QualityStage jobs
      1. 12.2.1 Design lineage
      2. 12.2.2 Operational lineage
    3. 12.3 Support for external processes
    4. 12.4 Support for InfoSphere FastTrack mapping specifications
    5. 12.5 Configuring business lineage
    6. 12.6 Search and display
      1. 12.6.1 Information catalog
      2. 12.6.2 Find and search
    7. 12.7 Querying and reporting
      1. 12.7.1 Reports
      2. 12.7.2 Querying
    8. 12.8 Conclusion
  18. Related publications
    1. IBM Redbooks
    2. Online resources
    3. Help from IBM
  19. Back cover