You are previewing Bioinformatics.
O'Reilly logo
Bioinformatics

Book Description

Life science data integration and interoperability is one of the most challenging problems facing bioinformatics today. In the current age of the life sciences, investigators have to interpret many types of information from a variety of sources: lab instruments, public databases, gene expression profiles, raw sequence traces, single nucleotide polymorphisms, chemical screening data, proteomic data, putative metabolic pathway models, and many others. Unfortunately, scientists are not currently able to easily identify and access this information because of the variety of semantics, interfaces, and data formats used by the underlying data sources.

Bioinformatics: Managing Scientific Data tackles this challenge head-on by discussing the current approaches and variety of systems available to help bioinformaticians with this increasingly complex issue. The heart of the book lies in the collaboration efforts of eight distinct bioinformatics teams that describe their own unique approaches to data integration and interoperability. Each system receives its own chapter where the lead contributors provide precious insight into the specific problems being addressed by the system, why the particular architecture was chosen, and details on the system's strengths and weaknesses. In closing, the editors provide important criteria for evaluating these systems that bioinformatics professionals will find valuable.

* Provides a clear overview of the state-of-the-art in data integration and interoperability in genomics, highlighting a variety of systems and giving insight into the strengths and weaknesses of their different approaches.
* Discusses shared vocabulary, design issues, complexity of use cases, and the difficulties of transferring existing data management approaches to bioinformatics systems, which serves to connect computer and life scientists.
* Written by the primary contributors of eight reputable bioinformatics systems in academia and industry including: BioKris, TAMBIS, K2, GeneExpress, P/FDM, MBM, SDSC, SRS, and DiscoveryLink.

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. The Morgan Kaufmann Series in Multimedia Information and Systems
  5. Copyright
  6. Contributors
  7. About the Authors
  8. Preface
  9. Chapter 1: Introduction
    1. 1.1 OVERVIEW
    2. 1.2 PROBLEM AND SCOPE
    3. 1.3 BIOLOGICAL DATA INTEGRATION
    4. 1.4 DEVELOPING A BIOLOGICAL DATA INTEGRATION SYSTEM
  10. Chapter 2: Challenges Faced in the Integration of Biological Information
    1. 2.1 THE LIFE SCIENCE DISCOVERY PROCESS
    2. 2.2 AN INFORMATION INTEGRATION ENVIRONMENT FOR LIFE SCIENCE DISCOVERY
    3. 2.3 THE NATURE OF BIOLOGICAL DATA
    4. 2.4 DATA SOURCES IN LIFE SCIENCE
    5. 2.5 CHALLENGES IN INFORMATION INTEGRATION
    6. CONCLUSION
  11. Chapter 3: A Practitioner’s Guide to Data Management and Data Integration in Bioinformatics
    1. 3.1 INTRODUCTION
    2. 3.2 DATA MANAGEMENT IN BIOINFORMATICS
    3. 3.3 DIMENSIONS DESCRIBING THE SPACE OF INTEGRATION SOLUTIONS
    4. 3.4 USE CASES OF INTEGRATION SOLUTIONS
    5. 3.5 STRENGTHS AND WEAKNESSES OF THE VARIOUS APPROACHES TO INTEGRATION
    6. 3.6 TOUGH PROBLEMS IN BIOINFORMATICS INTEGRATION
    7. 3.7 SUMMARY
    8. ACKNOWLEDGMENTS
  12. Chapter 4: Issues to Address While Designing a Biological Information System
    1. 4.1 LEGACY
    2. 4.2 A DOMAIN IN CONSTANT EVOLUTION
    3. 4.3 BIOLOGICAL QUERIES
    4. 4.4 QUERY PROCESSING
    5. 4.5 VISUALIZATION
    6. 4.6 CONCLUSION
    7. ACKNOWLEDGMENTS
  13. Chapter 5: SRS: An Integration Platform for Databanks and Analysis Tools in Bioinformatics
    1. 5.1 INTEGRATING FLAT FILE DATABANKS
    2. 5.2 INTEGRATION OF XML DATABASES
    3. 5.3 INTEGRATING RELATIONAL DATABASES
    4. 5.4 THE SRS QUERY LANGUAGE
    5. 5.5 LINKING DATABANKS
    6. 5.6 THE OBJECT LOADER
    7. 5.7 SCIENTIFIC ANALYSIS TOOLS
    8. 5.8 INTERFACES TO SRS
    9. 5.9 AUTOMATED SERVER MAINTENANACE WITH SRS PRISMA
    10. 5.10 CONCLUSION
  14. Chapter 6: The Kleisli Query System as a Backbone for Bioinformatics Data Integration and Analysis
    1. 6.1 MOTIVATING EXAMPLE
    2. 6.2 APPROACH
    3. 6.3 DATA MODEL AND REPRESENTATION
    4. 6.4 QUERY CAPABILITY
    5. 6.5 WAREHOUSING CAPABILITY
    6. 6.6 DATA SOURCES
    7. 6.7 OPTIMIZATIONS
    8. 6.8 USER INTERFACES
    9. 6.9 OTHER DATA INTEGRATION TECHNOLOGIES
    10. 6.10 CONCLUSIONS
  15. Chapter 7: Complex Query Formulation Over Diverse Information Sources in TAMBIS
    1. 7.1 THE ONTOLOGY
    2. 7.2 THE USER INTERFACE
    3. 7.3 THE QUERY PROCESSOR
    4. 7.4 RELATED WORK
    5. 7.5 CURRENT AND FUTURE DEVELOPMENTS IN TAMBIS
    6. ACKNOWLEDGMENTS
  16. Chapter 8: The Information Integration System K2
    1. 8.1 APPROACH
    2. 8.2 DATA MODEL AND LANGUAGES
    3. 8.3 AN EXAMPLE
    4. 8.4 INTERNAL LANGUAGE
    5. 8.5 DATA SOURCES
    6. 8.6 QUERY OPTIMIZATION
    7. 8.7 USER INTERFACES
    8. 8.8 SCALABILITY
    9. 8.9 IMPACT
    10. 8.10 SUMMARY
    11. ACKNOWLEDGMENTS
  17. Chapter 9: P/FDM Mediator for a Bioinformatics Database Federation
    1. 9.1 APPROACH
    2. 9.2 ANALYSIS
    3. 9.3 CONCLUSIONS
    4. ACKNOWLEDGMENT
  18. Chapter 10: Integration Challenges in Gene Expression Data Management
    1. 10.1 GENE EXPRESSION DATA MANAGEMENT: BACKGROUND
    2. 10.2 THE GENEEXPRESS SYSTEM
    3. 10.3 MANAGING GENE EXPRESSION DATA: INTEGRATION CHALLENGES
    4. 10.4 INTEGRATING THIRD-PARTY GENE EXPRESSION DATA IN GENEEXPRESS
    5. 10.5 SUMMARY
    6. ACKNOWLEDGMENTS
    7. TRADEMARKS
  19. Chapter 11: DiscoveryLink
    1. 11.1 APPROACH
    2. 11.2 QUERY PROCESSING OVERVIEW
    3. 11.3 EASE OF USE, SCALABILITY, AND PERFORMANCE
    4. 11.4 CONCLUSIONS
  20. Chapter 12: A Model-Based Mediator System for Scientific Data Management
    1. 12.1 BACKGROUND
    2. 12.2 SCIENTIFIC DATA INTEGRATION ACROSS MULTIPLE WORLDS: EXAMPLES AND CHALLENGES FROM THE NEUROSCIENCES
    3. 12.3 MODEL-BASED MEDIATION
    4. 12.4 KNOWLEDGE REPRESENTATION FOR MODEL-BASED MEDIATION
    5. 12.5 MODEL-BASED MEDIATOR SYSTEM AND TOOLS
    6. 12.6 RELATED WORK AND CONCLUSION
    7. ACKNOWLEDGMENTS
  21. Chapter 13: Compared Evaluation of Scientific Data Management Systems
    1. 13.1 PERFORMANCE MODEL
    2. 13.2 EVALUATION CRITERIA
    3. 13.3 TRADEOFFS
    4. 13.4 SUMMARY
  22. Concluding Remarks
  23. Appendix: Biological Resources
  24. Glossary
  25. System Information
  26. Index