In previous chapters you saw how to examine the lines of a file using Perl's array operations. Usually, you do this by saving the data in an array with each line of the file appearing as an element of the array.
Let's look at two methods to extract the annotation and the DNA from a GenBank file. In the first method, you'll slurp the file into an array and look through the lines, as in previous programs. In the second, you'll put the whole GenBank record into a scalar variable and use regular expressions to parse the information. Is one approach better than the other? Not necessarily: it depends on the data. There are advantages and disadvantages to each, but both get the job done.
I've put five GenBank records in a file called library.gb. As before, you can download the file from this book's web site. You'll use library.gb, and the file record.gb which contains just one GenBank record, in the next few examples.
Example 10-1 shows the first method, which operates on an array containing the lines of the GenBank record. The main program is followed by a subroutine that does the real work.
Example 10-1. Extract annotation and sequence from GenBank file
#!/usr/bin/perl # Extract annotation and sequence from GenBank file use strict; use warnings; use BeginPerlBioinfo; # see Chapter 6 about this module # declare and initialize variables my @annotation = ( ); my $sequence = ''; my $filename = 'record.gb'; parse1(\@annotation, \$sequence, $filename); ...