O'Reilly logo

Python for Bioinformatics by Kinser

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

6 Parsing DNA Data Files

Large databases of DNA information are being collected by several institutes. In the United States, a large repository is Genbank, which is under the sponsorship of the National Institutes of Health (http://www.ncbi.nlm.nih.gov/Genbank/index.html). The concern of this chapter is to develop programs capable of reading the files that are stored in three of the most popular formats: FASTA, Genbank, and ASN.1.

6.1 FASTA Files

The FASTA format is extremely simple, but it contains very little information aside from the sequence. A typical FASTA format is shown in Figure 6-1.

The first line contains a small header that may vary in content. In this case, the accession number and name of species and chromosome number are given. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required