You are previewing Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications.
O'Reilly logo
Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications

Book Description

This book represents the most comprehensive and up-to-date collection of information on the topic of computational molecular biology. Bringing the most recent research into the forefront of discussion, Algorithms in Computational Molecular Biology studies the most important and useful algorithms currently being used in the field, and provides related problems. It also succeeds where other titles have failed, in offering a wide range of information from the introductory fundamentals right up to the latest, most advanced levels of study.

Table of Contents

  1. Cover
  2. Half Title page
  3. Title page
  4. Copyright page
  5. Dedication
  6. Preface
  7. Contributors
  8. Series page
  9. Part I: Strings Processing and Application to Biological Sequences
    1. Chapter 1: String Data Structures for Computational Molecular Biology
      1. 1.1 Introduction
      2. 1.2 Main String Indexing Data Structures
      3. 1.3 Index Structures for Weighted Strings
      4. 1.4 Index Structures for Indeterminate Strings
      5. 1.5 String Data Structures in Memory Hierarchies
      6. 1.6 Conclusions
      7. References
    2. Chapter 2: Efficient Restricted-Case Algorithms for Problems in Computational Biology
      1. 2.1 The Need for Special Cases
      2. 2.2 Assessing Efficient Solvability Options for General Problems and Special Cases
      3. 2.3 String and Sequence Problems
      4. 2.4 Shortest Common Superstring
      5. 2.5 Longest Common Subsequence
      6. 2.6 Common Approximate Substring
      7. 2.7 Conclusion
      8. References
    3. Chapter 3: Finite Automata in Pattern Matching
      1. 3.1 Introduction
      2. 3.2 Direct Use of DFA in Stringology
      3. 3.3 NFA Simulation
      4. 3.4 Finite Automaton as Model of Computation
      5. 3.5 Finite Automata Composition
      6. 3.6 Summary
      7. References
    4. Chapter 4: New Developments in Processing of Degenerate Sequences
      1. 4.1 Introduction
      2. 4.2 Background
      3. 4.3 Basic Definitions
      4. 4.4 Repetitive Structures in Degenerate Strings
      5. 4.5 Conservative String Covering in Degenerate Strings
      6. 4.6 Conclusion
      7. References
    5. Chapter 5: Exact Search Algorithms for Biological Sequences
      1. 5.1 Introduction
      2. 5.2 Single Pattern Matching Algorithms
      3. 5.3 Algorithms for Multiple Patterns
      4. 5.4 Application of Exact Set Pattern Matching for Read Mapping and Comparison with Mapping Tools
      5. 5.5 Conclusions
      6. References
    6. Chapter 6: Algorithmic Aspects of Arc-Annotated Sequences
      1. 6.1 Introduction
      2. 6.2 Preliminaries
      3. 6.3 Longest Arc-Preserving Common Subsequence
      4. 6.4 Arc-Preserving Subsequence
      5. 6.5 Maximum Arc-Preserving Common Subsequence
      6. 6.6 Edit Distance
      7. References
    7. Chapter 7: Algorithmic Issues in DNA Barcoding Problems
      1. 7.1 Introduction
      2. 7.2 Test Set Problems: A General Framework for Several Barcoding Problems
      3. 7.3 A Synopsis of Biological Applications of Barcoding
      4. 7.4 Survey of Algorithmic Techniques on Barcoding
      5. 7.5 Information Content Approach
      6. 7.6 Set-Covering Approach
      7. 7.7 Experimental Results and Software Availability
      8. 7.8 Concluding Remarks
      9. Acknowledgments
      10. References
    8. Chapter 8: Recent Advances in Weighted DNA Sequences
      1. 8.1 Introduction
      2. 8.2 Preliminaries
      3. 8.3 Indexing
      4. 8.4 Pattern Matching
      5. 8.5 Approximate Pattern Matching
      6. 8.6 Repetitions, Covers, and Tandem Repeats
      7. 8.7 Motif Discovery
      8. 8.8 Conclusions
      9. References
    9. Chapter 9: DNA Computing for Subgraph Isomorphism Problem and Related Problems
      1. 9.1 Introduction
      2. 9.2 Definitions of Subgraph Isomorphism Problem and Related Problems
      3. 9.3 DNA Computing Models
      4. 9.4 The Sticker-Based Solution Space
      5. 9.5 Algorithms for Solving Problems
      6. 9.6 Experimental Data
      7. 9.7 Conclusion
      8. References
  10. Part II: Analysis of Biological Sequences
    1. Chapter 10: Graphs in Bioinformatics
      1. 10.1 Graph Theory—Origin
      2. 10.2 Graphs and the Biological World
      3. 10.3 Conclusion
      4. References
    2. Chapter 11: A Flexible Data Store for Managing Bioinformatics Data
      1. 11.1 Introduction
      2. 11.2 Data Model and System Overview
      3. 11.3 Replication and Load Balancing
      4. 11.4 Evaluation
      5. 11.5 Related Work
      6. 11.6 Summary
      7. References
    3. Chapter 12: Algorithms for the Alignment of Biological Sequences
      1. 12.1 Introduction
      2. 12.2 Alignment Algorithms
      3. 12.3 Score Functions
      4. 12.4 Benchmarks
      5. 12.5 Conclusion
      6. Acknowledgments
      7. References
    4. Chapter 13: Algorithms for Local Structural Alignment and Structural Motif Identification
      1. 13.1 Introduction
      2. 13.2 Problem Definition of Local Structural Alignment
      3. 13.3 Variable-Length Alignment Fragment Pair (VLAFP) Algorithm
      4. 13.4 Structural Alignment Based on Center of Gravity: SACG
      5. 13.5 Searching Structural Motifs
      6. 13.6 Using SACG Algorithm for Classification of New Protein Structures
      7. 13.7 Experimental Results
      8. 13.8 Accuracy Results
      9. 13.9 Conclusion
      10. Acknowledgments
      11. References
    5. Chapter 14: Evolution of the Clustal Family of Multiple Sequence Alignment Programs
      1. 14.1 Introduction
      2. 14.2 Clustal-ClustalV
      3. 14.3 ClustalW
      4. 14.4 ClustalX
      5. 14.5 ClustalW and ClustalX 2.0
      6. 14.6 DbClustal
      7. 14.7 Perspectives
      8. References
    6. Chapter 15: Filters and Seeds Approaches for Fast Homology Searches in Large Datasets
      1. 15.1 Introduction
      2. 15.2 Methods Framework
      3. 15.3 Lossless Filters
      4. 15.4 Lossy Seed-Based Filters
      5. 15.5 Conclusion
      6. 15.6 Acknowledgments
      7. References
    7. Chapter 16: Novel Combinatorial and Information-Theoretic Alignment-Free Distances Biological Data Mining
      1. 16.1 Introduction
      2. 16.2 Information-Theoretic Alignment-Free Methods
      3. 16.3 Combinatorial Alignment-Free Methods
      4. 16.4 Alignment-Free Compositional Methods
      5. 16.5 Alignment-Free Exact Word Matches Methods
      6. 16.6 Domains of Biological Application
      7. 16.7 Datasets and Software for Experimental Algorithmics
      8. 16.8 Conclusions
      9. References
    8. Chapter 17: In Silico Methods for the Analysis of Metabolites and Drug Molecules
      1. 17.1 Introduction
      2. 17.2 Molecular Descriptors
      3. 17.3 Databases
      4. 17.4 Methods and Data Analysis Algorithms
      5. 17.5 Conclusions
      6. Acknowledgments
      7. References
  11. Part III: Motif Finding and Structure Prediction
    1. Chapter 18: Motif Finding Algorithms in Biological Sequences
      1. 18.1 Introduction
      2. 18.2 Preliminaries
      3. 18.3 The Planted (l, d)-Motif Problem
      4. 18.4 The Extended (l, d)-Motif Problem
      5. 18.5 The Edited Motif Problem
      6. 18.6 The Simple Motif Problem
      7. 18.7 Conclusion
      8. References
    2. Chapter 19: Computational Characterization of Regulatory Regions
      1. 19.1 The Genome Regulatory Landscape
      2. 19.2 Qualitative Models of Regulatory Signals
      3. 19.3 Quantitative Models of Regulatory Signals
      4. 19.4 Detection of Dependencies in Sequences
      5. 19.5 Repositories of Regulatory Information
      6. 19.6 Using Predictive Models to Annotate Sequences
      7. 19.7 Comparative Genomics Characterization
      8. 19.8 Sequence Comparisons
      9. 19.9 Combining Motifs and Alignments
      10. 19.10 Experimental Validation
      11. 19.11 Summary
      12. References
    3. Chapter 20: Algorithmic Issues in the Analysis of Chip-SEQ Data
      1. 20.1 Introduction
      2. 20.2 Mapping Sequences on the Genome
      3. 20.3 Identifying Significantly Enriched Regions
      4. 20.4 Deriving Actual Transcription Factor Binding Sites
      5. 20.5 Conclusions
      6. References
    4. Chapter 21: Approaches and Methods for Operon Prediction Based on Machine Learning Techniques
      1. 21.1 Introduction
      2. 21.2 Datasets, Features, and Preprocesses for Operon Prediction
      3. 21.3 Machine Learning Prediction Methods for Operon Prediction
      4. 21.4 Conclusions
      5. 21.5 Acknowledgments
      6. References
    5. Chapter 22: Protein Function Prediction with Data-Mining Techniques
      1. 22.1 Introduction
      2. 22.2 Protein Annotation Based on Sequence
      3. 22.3 Protein Annotation Based on Protein Structure
      4. 22.4 Protein Function Prediction Based on Gene Expression Data
      5. 22.5 Protein Function Prediction Based on Protein Interactome Map
      6. 22.6 Protein Function Prediction Based on Data Integration
      7. 22.7 Conclusions and Perspectives
      8. References
    6. Chapter 23: Protein Domain Boundary Prediction
      1. 23.1 Introduction
      2. 23.2 Profiling Technique
      3. 23.3 Results
      4. 23.4 Discussion
      5. 23.5 Conclusions
      6. References
    7. Chapter 24: An Introduction to RNA Structure and Pseudoknot Prediction
      1. 24.1 Introduction
      2. 24.2 RNA Secondary Structure Prediction
      3. 24.3 RNA Pseudoknots
      4. 24.4 Conclusions
      5. References
  12. Part IV: Phylogeny Reconstruction
    1. Chapter 25: Phylogenetic Search Algorithms for Maximum Likelihood
      1. 25.1 Introduction
      2. 25.2 Computing the Likelihood
      3. 25.3 Accelerating the PLF by Algorithmic Means
      4. 25.4 Alignment Shapes
      5. 25.5 General Search Heuristics
      6. 25.6 Computing the Robinson Foulds Distance
      7. 25.7 Convergence Criteria
      8. 25.8 Future Directions
      9. References
    2. Chapter 26: Heuristic Methods for Phylogenetic Reconstruction with Maximum Parsimony
      1. 26.1 Introduction
      2. 26.2 Definitions and Formal Background
      3. 26.3 Methods
      4. 26.4 Conclusion
      5. References
    3. Chapter 27: Maximum Entropy Method for Composition Vector Method
      1. 27.1 Introduction
      2. 27.2 Models and Entropy Optimization
      3. 27.3 Application and Dicussion
      4. 27.4 Concluding Remarks
      5. References
  13. Part V: Microarray Data Analysis
    1. Chapter 28: Microarray Gene Expression Data Analysis
      1. 28.1 Introduction
      2. 28.2 DNA Microarray Technology and Experiment
      3. 28.3 Image Analysis and Expression Data Extraction
      4. 28.4 Data Processing
      5. 28.5 Missing Value Imputation
      6. 28.6 Temporal Gene Expression Profile Analysis
      7. 28.7 Cyclic Gene Expression Profiles Detection
      8. 28.8 Summary
      9. Acknowledgments
      10. References
    2. Chapter 29: Biclustering of Microarray Data
      1. 29.1 Introduction
      2. 29.2 Types of Biclusters
      3. 29.3 Groups of Biclusters
      4. 29.4 Evaluation Functions
      5. 29.5 Systematic and Stochastic Biclustering Algorithms
      6. 29.6 Biological Validation
      7. 29.7 Conclusion
      8. References
    3. Chapter 30: Computational Models for Condition-Specific Gene and Pathway Inference
      1. 30.1 Introduction
      2. 30.2 Condition-Specific Pathway Identification
      3. 30.3 Disease Gene Prioritization and Genetic Pathway Detection
      4. 30.4 Module Networks
      5. 30.5 Summary
      6. Acknowledgements
      7. References
    4. Chapter 31: Heterogeneity of Differential Expression in Cancer Studies: Algorithms and Methods
      1. 31.1 Introduction
      2. 31.2 Notations
      3. 31.3 Differential Mean of Expression
      4. 31.4 Differential Variability of Expression
      5. 31.5 Differential Expression in Compendium of Tumors
      6. 31.6 Differential Expression by Chromosomal Aberrations: The Local Properties
      7. 31.7 Differential Expression in Gene Interactome
      8. 31.8 Differential Coexpression: Global Multidimensional Interactome
      9. Acknowledgments
      10. References
  14. Part VI: Analysis of Genomes
    1. Chapter 32: Comparative Genomics: Algorithms and Applications
      1. 32.1 Introduction
      2. 32.2 Notations
      3. 32.3 Ortholog Assignment
      4. 32.4 Gene Cluster and Synteny Detection
      5. 32.5 Conclusions
      6. References
    2. Chapter 33: Advances in Genome Rearrangement Algorithms
      1. 33.1 Introduction
      2. 33.2 Preliminaries
      3. 33.3 Sorting by Reversals
      4. 33.4 Sorting by Transpositions
      5. 33.5 Other Operations
      6. 33.6 Sorting by More Than One Operation
      7. 33.7 Future Research Directions
      8. 33.8 Notes on Software
      9. References
    3. Chapter 34: Computing Genomic Distances : An Algorithmic Viewpoint
      1. 34.1 Introduction
      2. 34.2 Interval-Based Criteria
      3. 34.3 Character-Based Criteria
      4. 34.4 Conclusion
      5. References
    4. Chapter 35: Wavelet Algorithms for DNA Analysis
      1. 35.1 Introduction
      2. 35.2 DNA Representation
      3. 35.3 Statistical Correlations in DNA
      4. 35.4 Wavelet Analysis
      5. 35.5 Haar Wavelet Coefficients and Statistical Parameters
      6. 35.6 Algorithm of the Short Haar Discrete Wavelet Transform
      7. 35.7 Clusters of Wavelet Coefficients
      8. 35.8 Conclusion
      9. References
    5. Chapter 36: Haplotype Inference Models and Algorithms
      1. 36.1 Introduction
      2. 36.2 Problem Statement and Notations
      3. 36.3 Combinatorial Methods
      4. 36.4 Statistical Methods
      5. 36.5 Pedigree Methods
      6. 36.6 Evaluation
      7. 36.7 Discussion
      8. References
  15. Part VII: Analysis of Biological Networks
    1. Chapter 37: Untangling Biological Networks Using Bioinformatics
      1. 37.1 Introduction
      2. 37.2 Types of Biological Networks
      3. 37.3 Network Dynamic, Evolution and Disease
      4. 37.4 Future Challenges and Scope
      5. Acknowledgments
      6. References
    2. Chapter 38: Probabilistic Approaches for Investigating Biological Networks
      1. 38.1 Probabilistic Models for Biological Networks
      2. 38.2 Interpretation and Quantitative Analysis of Probabilistic Models
      3. 38.3 Conclusion
      4. Acknowledgments
      5. References
    3. Chapter 39: Modeling and Analysis of Biological Networks with Model Checking
      1. 39.1 Introduction
      2. 39.2 Preliminaries
      3. 39.3 Analyzing Genetic Networks with Model Checking
      4. 39.4 Probabilistic Model Checking for Biological Systems
      5. References
      6. Appendix
    4. Chapter 40: Reverse Engineering of Molecular Networks from a Common Combinatorial Approach
      1. 40.1 Introduction
      2. 40.2 Reverse-Engineering of Biological Networks
      3. 40.3 Classical Combinatorial Algorithms: A Case Study
      4. 40.4 Concluding Remarks
      5. Acknowledgments
      6. References
    5. Chapter 41: Unsupervised Learning for Gene Regulation Network Inference from Expression Data: A Review
      1. 41.1 Introduction
      2. 41.2 Gene Networks: Definition and Properties
      3. 41.3 Gene Expression: Data and Analysis
      4. 41.4 Network Inference as an Unsupervised Learning Problem
      5. 41.5 Correlation-Based Methods
      6. 41.6 Probabilistic Graphical Models
      7. 41.7 Constraint-Based Data Mining
      8. 41.8 Validation
      9. 41.9 Conclusion and Perspectives
      10. References
    6. Chapter 42: Approaches to Construction and Analysis of Microrna-Mediated Networks
      1. 42.1 Introduction
      2. 42.2 Fundamental Component Interaction Research: Predicting Mirna Genes, Regulators, and Targets
      3. 42.3 Identifying Mirna-Mediated Networks
      4. 42.4 Global and Local Architecture Analysis in Mirna-Containing Networks
      5. 42.5 Conclusion
      6. References
  16. Index