O'Reilly logo

Beginning Perl for Bioinformatics by James Tisdall

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Parsing Annotations

Now that you've successfully extracted the sequence, let's look at parsing the annotations of a GenBank file.

Looking at a GenBank record, it's interesting to think about how to extract the useful information. The FEATURES table is certainly a key part of the story. It has considerable structure: what should be preserved, and what is unnecessary? For instance, sometimes you just want to see if a word such as "endonuclease" appears anywhere in the record. For this, you just need a subroutine that searches for any regular expression in the annotation. Sometimes this is enough, but when detailed surgery is necessary, Perl has the necessary tools to make the operation successful.

Using Arrays

Example 10-3 parses a few pieces of information from the annotations in a GenBank file. It does this using the data in the form of an array.

Example 10-3. Parsing GenBank annotations using arrays

#!/usr/bin/perl # Parsing GenBank annotations using arrays use strict; use warnings; use BeginPerlBioinfo; # see Chapter 6 about this module # Declare and initialize variables my @genbank = ( ); my $locus = ''; my $accession = ''; my $organism = ''; # Get GenBank file data @genbank = get_file_data('record.gb'); # Let's start with something simple. Let's get some of the identifying # information, let's say the locus and accession number (here the same # thing) and the definition and the organism. for my $line (@genbank) { if($line =~ /^LOCUS/) { $line =~ s/^LOCUS\s*//; $locus = $line; ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required