Let's write a small program that stores some
DNA in a variable and prints it to the screen. The DNA is written in the
usual fashion, as a string made of the letters A, C, G, and T, and we'll call the
$DNA. In other words,
$DNA is the name of the DNA sequence data
used in the program. Note that in Perl, a variable is really the name for some data
you wish to use. The name gives you full access to the data. Example 4-1 shows the entire
Example 4-1. Putting DNA into the computer
#!/usr/bin/perl -w # Storing DNA in a variable, and printing it out # First we store the DNA in a variable called $DNA $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; # Next, we print the DNA onto the screen print $DNA; # Finally, we'll specifically tell the program to exit. exit;
Using what you've already learned about text editors and running Perl programs in Chapter 2, enter the code (or copy it from the book's web site) and save it to a file. Remember to save the program as ASCII or text-only format, or Perl may have trouble reading the resulting file.
The second step is to run the program. The details of how to run a program depend on the type of computer you have (see Chapter 2). Let's say the program is on your computer in a file called example4-1. As you recall from Chapter 2, if you are running this program on Unix or Linux, you type the following in a shell window:
If you've successfully run the program, you'll see the output printed on your computer screen.
Example 4-1 illustrates many of the ideas all our Perl programs will rely on. One of these ideas is control flow , or the order in which the statements in the program are executed by the computer.
Every program starts at the first line and executes the statements one after the other until it reaches the end, unless it is explicitly told to do otherwise. Example 4-1 simply proceeds from top to bottom, with no detours.
In later chapters, you'll learn how programs can control the flow of execution.
Now let's take a look at the parts of Example 4-1. You'll notice lots of blank lines. They're there to make the program easy for a human to read. Next, notice the comments that begin with the # sign. Remember from Chapter 3 that when Perl runs, it throws these away along with the blank lines. In fact, to Perl, the following is exactly the same program as Example 4-1:
#!/usr/bin/perl -w $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; print $DNA; exit;
In Example 4-1, I've made liberal use of comments. Comments at the beginning of code can make it clear what the program is for, who wrote it, and present other information that can be helpful when someone needs to understand the code. Comments also explain what each section of the code is for and sometimes give explanations on how the code achieves its goals.
It's tempting to belabor the point about the importance of comments. Suffice it to say that in most university-level, computer-science class assignments, the program without comments typically gets a low or failing grade; also, the programmer on the job who doesn't comment code is liable to have a short and unsuccessful career.
Because it starts with a # sign, the first line of the program looks like a comment, but it doesn't seem like a very informative comment:
This is a special line called command interpretation that tells the computer
running Unix and Linux that this is a Perl program. It may look slightly
different on different computers. On some machines, it's also unnecessary
because the computer recognizes Perl from other information. A Windows machine
is usually configured to assume that any program ending in .pl is a Perl program. In Unix or Linux, a
Windows command window, or a Mac OS X shell, you can type
perl my_program, and your Perl program
my_program won't need the special line. However,
it's commonly used, so we'll have it at start all our programs.
Notice that the first line of code uses a flag
-w. The "w" stands for warnings, and it causes Perl to print messages in case of an error.
Very often the error message suggests the line number where it thinks the error
began. Sometimes the line number is wrong, but the error is usually on or just
before the line the message suggests. Later in the book, you'll also see the
warnings as an alternative to
The next line of Example 4-1 stores the DNA in a variable:
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
This is a very common, very important thing to do in a computer language, so let's take a leisurely look at it. You'll see some basic features about Perl and about programming languages in general, so this is a good place to stop skimming and actually read.
To be more accurate, this line of code is an assignment
statement. Its purpose in this program is to store some DNA into a
$DNA. There are several
fundamental things happening here as you will see in the next sections.
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; print $DNA;
$A_poem_by_Seamus_Heaney = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; print $A_poem_by_Seamus_Heaney;
the program behaves in exactly the same way, printing out the DNA to the computer screen. The point is that the names of variables in a computer program are your choice. (Within certain restrictions: in Perl, a variable name must be composed from upper- or lowercase letters, digits, and the underscore _ character. Also the first character must not be a digit.)
This is another important point along the same lines as the remarks I've
already made about using blank lines and comments to make your code more
easily readable by humans. The computer attaches no meaning to the use of
the variable name
$DNA instead of
$A_poem_by_Seamus_Heaney, but whoever
reads the program certainly will. One name makes perfect sense, clearly
indicates what the variable is for in the program, and eases the chore of
understanding the program. The other name makes it unclear what the program
is doing or what the variable is for. Using well-chosen variable names is
part of what's called
self-documenting code. You'll still need comments, but perhaps
not as many, if you pick your variable names well.
You've noticed that the variable name
$DNA starts with dollar sign. In Perl this kind of variable
is called a scalar
variable, which is a variable that holds a single item of data.
Scalar variables are used for such data as strings or various kinds of
numbers (e.g., the string
numbers such as 25, 6.234, 3.5E10, -0.8373). A scalar variable holds just
one item of data at a time.
In Example 4-1, the scalar
$DNA is holding
some DNA, represented in the usual way by the letters A, C, G,
and T. As stated earlier, in computer science a sequence of letters is
called a string. In Perl you designate a string by putting it in quotes. You
can use single quotes, as in Example
4-1, or double quotes. (You'll learn the difference later.) The
DNA is thus represented by:
In Perl, to set a variable to a certain value, you use the
= sign is called
the assignment operator
. In Example 4-1,
is assigned to the variable
the assignment, you can use the name of the variable to get the value, as in
The order of the parts is important in an assignment statement. The value assigned to something appears on the right of the assignment operator. The variable that is assigned a value is always to the left of the assignment operator. In programming manuals, you sometimes come across the terms lvalue and rvalue to refer to the left and right sides of the assignment operator.
This use of the
= sign has a long history in
programming languages. However, it can be a source of confusion: for
instance, in most mathematics, using
means that the two things on either side of the sign are equal. So it's
important to note that in Perl, the
sign doesn't mean equality. It assigns a value to a variable. (Later, we'll
see how to represent equality.)
So, to summarize what we've learned so far about this statement:
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
ACGGGAGGACGGGAAAATTACTACGGCATTAGC out to the computer screen.
Notice that the
You'll see more about printing later.
Finally, the statement
tells the computer to exit the program. Perl doesn't require an
exit statement at the end of a program; once you get to the
end, the program exits automatically. But it doesn't hurt to put one in, and
it clearly indicates the program is over. You'll see other programs that
exit if something goes wrong before the program normally finishes, so the
exit statement is definitely