O'Reilly logo

Beginning Perl for Bioinformatics by James Tisdall

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Operating on Strings

It's not necessary to explode a string into an array in order to look at each character. In fact, sometimes you'd want to avoid that. A large string takes up a large amount of memory in your computer. So does a large array. When you explode a string into an array, the original string is still there, and you also have to make a copy of each character for the elements of the new array you're creating. If you have a large string, that already uses a good portion of available memory, creating an additional array can cause you to run out of memory. When you run out of memory, your computer performs poorly; it can slow to a crawl, crash, or freeze ("hang"). These haven't been worrisome considerations up to now, but if you use large data sets (such as the human genome), you have to take these things into account.

So let's say you'd like to avoid making a copy of the DNA sequence data into another variable. Is there a way to just look at the string $DNA and count the bases from it directly? Yes. Here's some pseudocode, followed by a Perl program:

read in the DNA from a file join the lines of the file into a single string of $DNA # initialize the counts count_of_A = 0 count_of_C = 0 count_of_G = 0 count_of_T = 0 for each base at each position in $DNA if base is A count_of_A = count_of_A + 1 if base is C count_of_C = count_of_C + 1 if base is G count_of_G = count_of_G + 1 if base is T count_of_T = count_of_T + 1 done print count_of_A, count_of_C, count_of_G, count_of_T ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required