Cover image for BLAST

Book description

Sequence similarity is a powerful tool for discovering biological function. Just as the ancient Greeks used comparative anatomy to understand the human body and linguists used the Rosetta stone to decipher Egyptian hieroglyphs, today we can use comparative sequence analysis to understand genomes. BLAST (Basic Local Alignment Search Tool), is a sophisticated software package for rapid searching of nucleotide and protein databases. It is one of the most important software packages used in sequence analysis and bioinformatics. Most users of BLAST, however, seldom move beyond the program's default parameters, and never take advantage of its full power. BLAST is the only book completely devoted to this popular suite of tools. It offers biologists, computational biology students, and bioinformatics professionals a clear understanding of BLAST as well as the science it supports. This book shows you how to move beyond the default parameters, get specific answers using BLAST, and how to interpret your results. The book also contains tutorial and reference sections covering NCBI-BLAST and WU-BLAST, background material to help you understand the statistics behind BLAST, Perl scripts to help you prepare your data and analyze your results, and a wealth of tips and tricks for configuring BLAST to meet your own research needs. Some of the topics covered include:

  • BLAST basics and the NCBI web interface

  • How to select appropriate search parameters

  • BLAST programs: BLASTN, BLASTP, BLASTX, TBLASTN, TBLASTX, PHI-BLAST, and PSI BLAST

  • Detailed BLAST references, including NCBI-BLAST and WU-BLAST

  • Understanding biological sequences

  • Sequence similarity, homology, scoring matrices, scores, and evolution

  • Sequence Alignment

  • Calculating BLAST statistics

  • Industrial-strength BLAST, including developing applications with Perl and BLAST

BLAST is the only comprehensive reference with detailed, accurate information on optimizing BLAST searches for high-throughput sequence analysis. This is a book that any biologist should own.

Table of Contents

  1. Table of Contents
  2. Foreword
  3. Preface
    1. Audience for This Book
    2. Structure of This Book
    3. A Little Math, a Little Perl
    4. Conventions Used in This Book
    5. URLs Referenced in This Book
    6. Comments and Questions
    7. Acknowledgments
      1. Ian
      2. Mark
      3. Joey
  4. Part I
    1. Hello BLAST
      1. What Is BLAST?
      2. Using NCBI-BLAST
        1. Choosing the BLAST Program
        2. Entering the Query Sequence
        3. Choosing the Database to Search
        4. Choosing the Parameters of the Search
        5. Choosing the Format
        6. Submitting the Search
        7. Viewing the Results
      3. Alternate Output Formats
      4. Alternate Alignment Views
      5. The Next Step
      6. Further Reading
  5. Part II
    1. Biological Sequences
      1. The Central Dogma of Molecular Biology
        1. DNA
        2. RNA
        3. Protein
        4. The Genetic Code
      2. Evolution
        1. Mutation
        2. Natural Selection
        3. Genetic Drift
        4. The Neutral Theory of Evolution
        5. Molecular Clocks
        6. Homology, Phylogeny, and Trees
        7. The Tree of Life
      3. Genomes and Genes
        1. Prokaryotic Genes
        2. Eukaryotic Genes
        3. Transcripts
        4. Repeats
        5. Pseudogenes
      4. Biological Sequences and Similarity
      5. Further Reading
    2. Sequence Alignment
      1. Global Alignment: Needleman-Wunsch
        1. Initialization
        2. Fill
        3. Trace-Back
      2. Local Alignment: Smith-Waterman
      3. Dynamic Programming
      4. Algorithmic Complexity
      5. Global Versus Local
      6. Variations
        1. Gap Modifications
        2. Reduced Memory
        3. Aligning Transcripts to Genomic Sequence
      7. Final Thoughts
      8. Further Reading
    3. Sequence Similarity
      1. Introduction to Information Theory
      2. Amino Acid Similarity
      3. Scoring Matrices
        1. PAM and BLOSUM Matrices
      4. Target Frequencies, lambda, and H
        1. Lambda
        2. Relative Entropy
        3. Match-Mismatch Scoring
      5. Sequence Similarity
      6. Karlin-Altschul Statistics
        1. Gapped Alignments
        2. Length Correction
      7. Sum Statistics and Sum Scores
        1. Converting a Sum Score to a Sum Probability
        2. Probability Versus Expectation
      8. Further Reading
  6. Part III
    1. BLAST
      1. The Five BLAST Programs
      2. The BLAST Algorithm
        1. Seeding
          1. Implementation details
        2. Extension
          1. Implementation details
        3. Evaluation
          1. Implementation details
      3. Further Reading
    2. Anatomy of a BLAST Report
      1. Basic Structure
      2. Alignments
        1. BLASTP
        2. BLASTN
        3. BLASTX
        4. TBLASTN
        5. TBLASTX
        6. Alignment Groups
    3. A BLAST Statistics Tutorial
      1. Basic BLAST Statistics
        1. Actual Versus Effective Lengths
        2. The Raw Score and Bit Score
        3. The Expect of an HSP
        4. The WU-BLAST P-Value
        5. Sum Statistics
        6. An Expect(n) Means That Sum Statistics Were Applied
        7. Sum Statistics Are Pair-Wise in Their Focus
        8. The Sum Score
        9. Effective Length of a BLASTX Query
        10. Calculating a Sum Score
        11. Calculating the Pair-Wise Sum P-Value
        12. Correcting for Multiple Tests
        13. Correcting for Database Size
        14. Frame- and Size-Corrected Expects
      2. Using Statistics to Understand BLAST Results
      3. Where Did My Oligo Go?
        1. Karlin-Altschul Statistics as a Tool for Further Investigation
        2. What It All Means