This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
38
|
Chapter 2: Biological Sequences
genetic element or selfish DNA (a phrase coined by Francis Crick). These entities are
a bit like the fleas and ticks of the genome: they copy and spread themselves within
and between genomes and are generally believed to do little for the host genome.
Selfish DNAs are usually further classified into three subcategories: transposons, ret-
roviruses, and retrotransposons. If you see these names in a BLAST report, you may
need to use a repeat filter.
Pseudogenes
One of the most confounding problems in similarity searches is the presence of
pseudogenes. As the name suggests, pseudogenes are “fake genes”; that is, they look
like they could encode a protein, but they aren’t functional. Pseudogenes come from
a variety of sources. A mutation that introduces a stop codon into a gene creates a
pseudogene, but more commonly, pseudogenes are created from some kind of dupli-
cation event. Sometimes, through various mechanisms, regions of a chromosome
may become duplicated. The extra copies of genes are generally free of selective pres-
sures and may become pseudogenes as they accumulate mutations. Duplication may
also result from repetitive elements that include neighboring DNA as they copy
themselves into new locations. In eukaryotes, a very common form of pseudogene is
the retro-pseudogene, in which the mRNA from a gene is reverse-transcribed into
DNA and inserted back into the genome. Because retro-pseudogenes come from
mRNA, they contain the hallmarks of transcripts, notably an absence of introns and
the presence of a poly-A tail. They are therefore easy to detect if you know what to
look for. Most retro-pseudogenes come from highly transcribed genes such as the
protein components of the ribosome.
Biological Sequences and Similarity
The beginning of this chapter asked why biological sequences are similar to one
another. Let’s answer that question now. You’ve seen that biological sequences like
proteins may have important functions necessary for the survival of an organism.
You’ve also seen that DNA sequence can mutate randomly, and this may change
how a sequence functions. Over time, both functional constraints and random pro-
cesses impact the course of sequence evolution. The degree to which a sequence fol-
lows a functional or random path depends on natural selection and neutral
evolution. So the reason why sequences are similar to one another is because they
start out similar to one another and follow different paths. When you read a BLAST
report, you will find that your knowledge of molecular and evolutionary biology will
help you interpret the similarities and differences you see.

Get BLAST now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.