20 Species Identification

As we saw in earlier chapters, most amino acids can be created from several codons. This chapter will test the theory that different species can be identified by their codon distributions. By comparing the codon frequencies of genes from different species, systems can be developed to identify the species.

20.1 Data Collection

In a study conducted by Kanaya et al. (2001), 29 different bacterial species produced more than 59,000 genes, each with at least 100 codons. The study showed that species had signature codon frequencies and that it was possible to detect these patterns. The work in this chapter will use just five of the species as an instructional tool. The selected bacteria are shown in Table 20-1.

Each of the ...

Get Python for Bioinformatics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.