One of the most common things we do in bioinformatics is to look for motifs , short segments of DNA or protein that are of particular interest. They may be regulatory elements of DNA or short stretches of protein that are known to be conserved across many species. (The PROSITE web site at http://www.expasy.ch/prosite/ has extensive information about protein motifs.)
The motifs you look for in biological sequences are usually not one specific sequence. They may have several variants—for example, positions in which it doesn't matter which base or residue is present. They may have variant lengths as well. They can often be represented as regular expressions, which you'll see more of in the discussion following Example 5-3; in Chapter 9; and elsewhere in the book.
Perl has a handy set of features for finding things in strings. This, as much as anything, has made it a popular language for bioinformatics. Example 5-3 introduces this string-searching capability; it does something genuinely useful, and similar programs are used all the time in biology research. It does the following:
Reads in protein sequence data from a file
Puts all the sequence data into one string for easy searching
Looks for motifs the user types in at the keyboard
Example 5-3. Searching for motifs
#!/usr/bin/perl -w # Searching for motifs # Ask the user for the filename of the file containing # the protein sequence data, and collect it from the keyboard print "Please type the filename of the protein sequence ...