Chapter 4. Values and Expressions

Data is semi-animate…sort of like programmers.

—Arthur Norman

Constructing and using values ought to be trivial. After all, there are very few components of a Perl program simpler than a character string or a number or a + operator.

Unfortunately, the syntax of Perl's literal values is so rich that there are plenty of ways to mess them up. Variables can interpolate unexpectedly, or fail to interpolate at all. Character escape codes and literal numbers can mysteriously appear in the wrong base. Delimiters can be just about anything you like.

And Perl's operators are even worse. Several of them are polymorphic: silently changing their behaviour depending on the type of argument they're applied to. Others are monomorphic: silently changing their arguments to fit their behaviour. Others are just plain inefficient in some usages.

This chapter suggests some appropriate coding habits that can help you avoid the pitfalls associated with creating values and manipulating them in expressions.

String Delimiters

Use interpolating string delimiters only for strings that actually interpolate.

Unexpectedly interpolating a variable in a character string is a common source of errors in Perl programs. So is unexpected non-interpolation. Fortunately, Perl provides two distinct types of strings that make it easy to specify exactly what you want.

If you're creating a literal character string and you definitely intend to interpolate one or more variables into it, use a double-quoted string:

my $spam_name = "$title $first_name $surname";
my $pay_rate  = "$minimal for maximal work";

If you're creating a literal character string and not intending to interpolate any variables into it, use a single-quoted string:

my $spam_name = 'Dr Lawrence Mwalle';
my $pay_rate  = '$minimal for maximal work';

If your uninterpolated string includes a literal single quote, use the q{…} form instead:

my $spam_name = q{Dr Lawrence ('Larry') Mwalle};
my $pay_rate  = q{'$minimal' for maximal work};

Don't use backslashes as quote delimiters; they only make it harder to distinguish the content from the container:

my $spam_name = 'Dr Lawrence (\'Larry\') Mwalle';
my $pay_rate  = '\'$minimal\' for maximal work';

If your uninterpolated string includes both a literal single quote and an unbalanced brace, use square brackets as delimiters instead:

my $spam_name = q[Dr Lawrence }Larry{ Mwalle];
my $pay_rate  = q['$minimal' for warrior's work {{:-)];

Reserving interpolating quoters for strings that actually do interpolate something[18] can help you avoid unintentional interpolations, because the presence of a $ or @ in a single-quoted string then becomes a sign that something might be amiss. Likewise, once you become used to seeing double quotes only on interpolated strings, the absence of any variable in a double-quoted string becomes a warning sign. So these rules also help highlight missing intentional interpolations.

The four distinct rules are fine for isolated literals, but when you're creating a set of related string values, mixing and matching the rules can severely reduce the readability of your code:

my $title         = 'Perl Best Practices';
my $publisher     = q{O'Reilly};
my $end_of_block  = '}';
my $closing_delim = q['}];
my $citation      = "$title ($publisher)";

For sequences of "parallel" strings, choose the most general delimiters required and use them consistently throughout the set:

my $title         =  q[Perl Best Practices];
my $publisher     =  q[O'Reilly];
my $end_of_block  =  q[}];
my $closing_delim =  q['}];my $citation      = qq[$title ($publisher)];

Note that there's a two-column gap between the assignment operator and each q[…] character string. This aligns the string delimiters with those of the lone qq[…] string, which helps its keyword stand out and draws attention to its different semantics.

Empty Strings

Don't use "" or '' for an empty string.

An important exception to the preceding rules is the empty string. You can't use "", as an empty string doesn't interpolate anything. It doesn't contain a literal quote or brace either, so the previous rules call for it to be written like so:

$error_msg = '';

But that's not a good choice. In many display fonts, it's far too easy to mistake '' (single-quote, single-quote) for " (a lone double-quote), which means that you need to apply the second rule for non-interpolated strings, and write each empty string like so, preferably with a comment highlighting it:

$error_msg = q{};   # Empty string

Also see the "Constants" guideline later in this chapter.

Single-Character Strings

Don't write one-character strings in visually ambiguous ways.

Character strings that consist of a single character can present a variety of problems, all of which make code harder to maintain.

A single space in quotes is easily confused with an empty string:

$separator = ' ';

Like an empty string, it should be specified more verbosely:

$separator = q{ };   # Single space

Literal tabs are even worse (and not just in single-character strings):

$separator  = ' ';         # Empty string, single space, or single tab???
$column_gap = '         ';# Spaces? Tabs? Some combination thereof?

Always use the interpolated \t form instead:

$separator  = "\t";
$column_gap = "\t\t\t";

Literal single-quote and double-quote characters shouldn't be specified in quotation marks either, for obvious aesthetic reasons: '"', "\"", '\'', "'". Use q{"} and q{'} instead.

You should also avoid using quotation marks when specifying a single comma character. The most common use of a comma string is as the first argument to a join:

my $printable_list = '(' . join(',', @list) . ')';

The ',', sequence is unnecessarily hard to decipher, especially when:

my $printable_list = '(' . join(q{,}, @list) . ')';

is just as easy to write, and stands out more clearly as being a literal. See the "Constants" guideline later in this chapter for an even cleaner solution.

Escaped Characters

Use named character escapes instead of numeric escapes.

Some ASCII characters that might appear in a string—such as DEL or ACK or CAN—don't have a "native" Perl representation. When one or more of those characters is required, the standard solution is to use a numeric escape: a backslash followed by the character's ASCII value inside double-quotes. For example, using octal escapes:

$escape_seq = "\127\006\030Z";       # DEL-ACK-CAN-Z

or hexadecimal escapes:

$escape_seq = "\x7F\x06\x22Z";       # DEL-ACK-CAN-Z

But not everyone who subsequently reads your code will be familiar with the ASCII values for these characters, which means they will have to rely on the associated comments. That's a real shame too, because both of the previous examples are wrong! The correct sequence was:

$escape_seq = "\177\006\030Z";       # Octal DEL-ACK-CAN-Z

or:

$escape_seq = "\x7F\x06\x18Z";       # Hexadecimal DEL-ACK-CAN-Z

Errors like that are particularly hard to track down. Even if you do know the ASCII table by heart, it's still easy to mistakenly type "\127" for DEL because the ASCII code for DEL is 127. At least, in base 10 it is. Unfortunately, backslashed escapes in strings are specified in base 8. And once your brain has accepted the 127-is-DEL relationship, it becomes exceptionally hard to see the mistake. After all, it looks right.

That's why it's better to use named escapes for those characters that have no explicit Perl representation. Named escapes are available in Perl 5.6 and later, and are enabled via the use charnames pragma. Once they're operational, instead of using a numeric escape you can put the name of the required character inside a \N{…} sequence within any double-quoted string. For example:

use charnames qw( :full );

$escape_seq = "\N{DELETE}\N{ACKNOWLEDGE}\N{CANCEL}Z";

Note that there's no need for a comment here; when you use the actual names of the characters within the string, the escapes become self-documenting.

Constants

Use named constants, but don't use constant.

Raw numbers that suddenly appear in the middle of a program are often mysterious, frequently confusing, and always a potential source of errors. Certain types of unprintable character strings—for example, initialization strings for modems—are similarly awkward.

A line like this:

print $count * 42;

is unsatisfactory, as the reader may have no idea from the context why the variable is being multiplied by that particular number. Is it 42: the number of dots on a pair of dice? Or 42: the decimal ASCII value of asterisk? Or 42: the number of chromosomes in common wheat? Or 42: the angular spread of a rainbow? Or 42: the number of lines per page in the Gutenberg Bible? Or 42: the number of gallons per barrel of oil?

Replace these kinds of raw literals with a read-only lexical variable whose name explains the meaning of the number:

use Readonly;
Readonly my $MOLYBDENUM_ATOMIC_NUMBER => 42;

# and later...print $count * $MOLYBDENUM_ATOMIC_NUMBER;

The Readonly CPAN module exports a single subroutine (Readonly( )) that expects two arguments: a scalar, array, or hash variable, and a value. The value is assigned to the variable, and then the variable's "read-only" flag is set, to prevent any further assignments. Note the use of all-uppercase in the variable name (in accordance with the guideline in Chapter 3) and the use of the fat comma (because the constant name and its value form a natural pair—see "Fat Commas" later in this chapter).

If you accidentally try to assign a new value to a constant:

$MOLYBDENUM_ATOMIC_NUMBER = $CARBON_ATOMIC_NUMBER * $NITROGEN_ATOMIC_NUMBER;

the interpreter immediately throws an exception:

Modification of a read-only value attempted at nuclear_lab.pl line 13

Even when the constant is instantly recognizable, and highly unlikely ever to change, it's still better to give it a name. Naming the constant improves the level of abstraction, and therefore the readability, of the resulting code:

use Readonly;
Readonly my $PI => 3.1415926;

# and later...$area = $PI * $radius**2;

The same approach is also particularly helpful when dealing with empty strings:

use Readonly;
Readonly my $EMPTY_STR => q{};

# and later...my $error_msg = $EMPTY_STR;

This named constant is far less likely to be overlooked or misinterpreted than a raw '' might be. It's also less mystifying to inexperienced Perl programmers than a q{}. Likewise, the other visually ambiguous literals can be made much clearer with:

Readonly my $SPACE        => q{ };
Readonly my $SINGLE_QUOTE => q{'};
Readonly my $DOUBLE_QUOTE => q{"};Readonly my $COMMA        => q{,};

The obvious question at this point is: why use Readonly instead of use constant? After all, the constant pragma comes standard with Perl, and the constants it creates don't have those annoying sigils.

Well, it turns out those annoying sigils are actually highly useful, because they allow Readonly-generated constants to be interpolated into other strings. For example:

use Readonly;
Readonly my $DEAR      => 'Greetings to you,';
Readonly my $SINCERELY => 'May Heaven guard you from all misfortune,';

$msg = <<"END_MSG";
$DEAR $target_name

$scam_pitch

$SINCERELY

$fake_nameEND_MSG

Bareword constants can't be interpolated, so you have to write:

use constant (
    DEAR      => 'Greetings to you,',
    SINCERELY => 'May Heaven guard you from all misfortune,',
);

# and later...

$msg = DEAR . $target_name
       . "$scam_pitch\n\n"
       . SINCERELY
       . "\n\n$fake_name";

which is both harder to read and easier to get wrong (for example, there's a space missing between the DEAR and the $target_name).

The sigils also ensure that constants behave as expected in autostringifying contexts:

use Readonly;
Readonly my $LINES_PER_PAGE => 42;       # Gutenberg-compatible

# and later...

$margin{$LINES_PER_PAGE}                 # sets $margin{'42'}    = $MAX_LINES - $LINES_PER_PAGE;

In contrast, constants created by use constant are treated as barewords anywhere a string is expected:

use constant (
    LINES_PER_PAGE => 42
);

# and later...

$margin{LINES_PER_PAGE}               # sets $margin{'LINES_PER_PAGE'}
    = MAX_LINES - LINES_PER_PAGE;

But perhaps most importantly, use Readonly allows you to create lexically scoped constants at runtime:

EVENT:
while (1) {
    use Readonly;
    Readonly my $EVENT => get_next_event( );

    last EVENT if not defined $EVENT;

    if ($VERBOSE) {
        print $EVENT->desc( ), "\n";
    }

    # process event here...}

whereas use constant creates package-scoped constant subroutines at compile time:

EVENT:
while (1) {
    use constant EVENT => get_next_event( );

    last EVENT if not defined EVENT;

    if (VERBOSE) {
        print EVENT->desc( ), "\n";
    }

    # process event here...
}

That difference is critical here, because the use constant version will call get_next_event( ) only once—at compile time. If no event is available at that time, the subroutine will presumably return undef, and the loop will terminate before completing even a single iteration. The behaviour will be even worse if an event is available at compile time, in which case that event will be bound forever to the EVENT constant, and the loop will never terminate. The Readonly version doesn't suffer the same problem, because it executes at runtime, reinitializing the $EVENT constant each time the loop iterates.

Note that to get the full benefits of Readonly, you need to be using Perl 5.8 and have installed the associated Readonly::XS module, which requires precompilation. Be sure to read the module's documentation for a careful description of the pros and cons of using Readonly under earlier versions of Perl or without the precompiled helper module.

If you decide not to use the Readonly module in production code (for performance or political reasons), then using constant is still better than using literal values.

Leading Zeros

Don't pad decimal numbers with leading zeros.

Several of the guidelines in this book recommend laying out data in table format, and aligning that data vertically. For example:

use Readonly;

Readonly my %ATOMIC_NUMBER => (
    NITROGEN   =>    7,
    NIOBIUM    =>   41,
    NEODYNIUM  =>   60,
    NOBELIUM   =>  102,);

But sometimes the desire to make columns line up cleanly can be counterproductive. For example, you might be tempted to pad the atomic weight values with zeros to make them uniform:

use Readonly;

Readonly my %ATOMIC_NUMBER => (
    NITROGEN   =>  007,
    NIOBIUM    =>  041,
    NEODYNIUM  =>  060,
    NOBELIUM   =>  102,
);

Unfortunately, that also makes them wrong. Even though leading zeros aren't significant in mathematics, they are significant in Perl. Any integer that begins with a zero is interpreted as an octal number, not a decimal. So the example zero-padded version is actually equivalent to:

use Readonly;

Readonly my %ATOMIC_NUMBER => (
    NITROGEN   =>   7,
    NIOBIUM    =>  33,
    NEODYNIUM  =>  48,
    NOBELIUM   => 102,
);

To avoid this covert transmutation of the numbers, never start a literal integer with zero. Even if you do intend to specify octal numbers, don't use a leading zero, as that may still mislead inattentive future readers of your code.

If you need to specify octal values, use the built-in oct function, like so:

use Readonly;

Readonly my %PERMISSIONS_FOR => (
    USER_ONLY     => oct(600),
    NORMAL_ACCESS => oct(644),
    ALL_ACCESS    => oct(666),);

Long Numbers

Use underscores to improve the readability of long numbers.

Large numbers can be difficult to sanity check:

$US_GDP              = 10990000000000;
$US_govt_revenue     =  1782000000000;
$US_govt_expenditure =  2156000000000;

Those figures are supposed to be in the trillions, but it's very hard to tell if they have the right number of zeros. So Perl provides a convenient mechanism for making large numbers easier to read: you can use underscores to "separate your thousands":

# In the US they use thousands, millions, billions, trillions, etc...
$US_GDP              = 10_990_000_000_000;
$US_govt_revenue     =  1_782_000_000_000;$US_govt_expenditure =  2_156_000_000_000;

Prior to Perl 5.8, these separators could only be placed in front of every third digit of an integer (i.e., to separate the thousands, millions, billions, etc.). From 5.8 onwards, underscores can be placed between any two digits. For example:

# In India they use lakhs, crores, arabs, kharabs, etc...
$India_GDP              = 30_33_00_00_00_000;
$India_govt_revenue     =    86_69_00_00_000;$India_govt_expenditure =  1_14_60_00_00_000;

Separators can also now be used in floating-point numbers and non-decimals, to make them easier to comprehend as well:

use bignum;
$PI = 3.141592_653589_793238_462643_383279_502884_197169_399375;$subnet_mask= 0xFF_FF_FF_80;

Multiline Strings

Lay out multiline strings over multiple lines.

If a string has embedded newline characters, but the entire string won't fit on a single source line, then break the string after each newline and concatenate the pieces:

$usage = "Usage: $0 <file> [-full]\n"
         . "(Use -full option for full dump)\n"         ;

In other words, the internal appearance of the string should mirror its external (printed) appearance as closely as possible.

Don't, however, be tempted to make the newline implicit, by wrapping a single string across multiple lines, like so:

$usage = "Usage: $0 <file> [-full]
(Use -full option for full dump)
";

Even though actual line breaks inside such a string do become newline characters within the string, the readability of such code suffers severely. It's harder to verify the line structure of the resulting string, because the first line is indented whilst the remaining lines have to be fully left-justified. That justification can also compromise your code's indentation structure.

Here Documents

Use a heredoc when a multiline string exceeds two lines.

The "break-after-newlines-and-concatenate" approach is fine for a small number of lines, but it starts to become inefficient—and ugly—for larger chunks of text.

For multiline strings that exceed two lines, use a heredoc:

$usage = <<"END_USAGE";
Usage: $0 <file> [-full] [-o] [-beans]
Options:
    -full  : produce a full dump
    -o     : dump in octal
    -beans : source is JavaEND_USAGE

instead of:

$usage = "Usage: $0 <file> [-full] [-o] [-beans]\n"
         . "Options:\n"
         . "    -full  : produce a full dump\n"
         . "    -o     : dump in octal\n"
         . "    -beans : source is Java\n"
         ;

Heredoc Indentation

Use a "theredoc" when a heredoc would compromise your indentation.

Of course, even if your lines are all simple strings, the problem with using a heredoc in the middle of code is that its contents must be left-justified, regardless of the indentation level of the code it's in:

if ($usage_error) {
    warn <<'END_USAGE';
Usage: qdump <file> [-full] [-o] [-beans]
Options:
    -full  : produce a full dump
    -o     : dump in octal
    -beans : source is Java
END_USAGE
}

A better practice is to factor out any such heredoc into a predefined constant or a subroutine (a "theredoc"):

use Readonly;
Readonly my $USAGE => <<'END_USAGE';
Usage: qdump file [-full] [-o] [-beans]
Options:
    -full  : produce a full dump
    -o     : dump in octal
    -beans : source is Java
END_USAGE

# and later...

if ($usage_error) {
    warn $USAGE;}

If the heredoc needs to interpolate variables whose values are not known at compile time, use a subroutine instead, and parameterize the variables:

sub build_usage {
    my ($prog_name, $filename) = @_;

    return <<"END_USAGE";
Usage: $prog_name $filename [-full] [-o] [-beans]
Options:
    -full  : produce a full dump
    -o     : dump in octal
    -beans : source is Java
END_USAGE
}

# and later...

if ($usage_error) {
    warn build_usage($PROGRAM_NAME, $requested_file);}

The heredoc does compromise the indentation of the subroutine, but that's now a small and isolated section of the code, so it doesn't significantly impair the overall readability of your program.

Heredoc Terminators

Make every heredoc terminator a single uppercase identifier with a standard prefix.

You can use just about anything you like as a heredoc terminator. For example:

print <<'end list';          # Prints 3 lines then [DONE]
get name
set size
put next
end list

print "[DONE]\n";

or:

print <<'';                  # Prints 4 lines (up to the empty line) then [DONE]
get name
set size
put next
end list

print "[DONE]\n";

or even:

print <<'print "[DONE]\n";'; # Prints 5 lines but no [DONE]!
get name
set size
put next
end list

print "[DONE]\n";

Please don't. Heredocs are tough enough to understand as it is. Using bizarre terminators only makes them more difficult. It's a far better practice to stick with terminators that are capitalized (so they stand out better in mixed-case code) and free of whitespace (so only a single visual token has to be recognized).

For example, compared to the previous examples, it's much easier to tell what the contents of the following heredoc are:

print <<'END_LIST';
get name
set size
put nextEND_LIST

But even with a single identifier as terminator, both the contents and the termination marker of a heredoc still have to be left-justified. So it can still be difficult to detect the end of a heredoc. By naming every heredoc marker with a standard, easily recognized prefix, you can make them much easier to pick out.

'END_…' is the recommended choice for this prefix. That is, instead of:

Readonly my $USAGE => <<"USAGE";
Usage: $0 <file> [-full] [-o] [-beans]
Options:
    -full  : produce a full dump
    -o     : dump in octal
    -beans : source is Java
USAGE

delimit your heredocs like so:

Readonly my $USAGE => <<"END_USAGE";
Usage: $0 <file> [-full] [-o] [-beans]
Options:
    -full  : produce a full dump
    -o     : dump in octal
    -beans : source is JavaEND_USAGE

It helps to think of the << heredoc introducer as being pronounced "Everything up to…", so that the previous code reads as: the read-only $USAGE variable is initialized with everything up to END_USAGE.

Heredoc Quoters

When introducing a heredoc, quote the terminator.

Notice that all the heredoc examples in the previous guidelines used either single or double quotes after the <<. Single-quoting the marker forces the heredoc to not interpolate variables. That is, it acts just like a single-quoted string:

Readonly my $GRIPE => <<'END_GRIPE';
$minimal for maximal work
END_GRIPEprint $GRIPE;    # Prints: $minimal for maximal work

Double-quoting the marker ensures that the heredoc string is interpolated, just like a double-quoted string:

Readonly my $GRIPE => <<"END_GRIPE";
$minimal for maximal work
END_GRIPEprint $GRIPE;    # Prints: 4.99 an hour for maximal work

Most people aren't sure what the default interpolation behaviour is if you don't use any quotes on the marker:

Readonly my $GRIPE => <<END_GRIPE;
$minimal for maximal work
END_GRIPE

print $GRIPE;    # ???

Do you know? Are you sure? And even if you are sure you know, are you sure that your colleagues all know?

And that's the whole point. Heredocs aren't used as frequently as other types of strings, so their default interpolation behaviour isn't as familiar to most Perl programmers. Adding the explicit quotes around the heredoc marker takes almost no extra effort, but it relieves every reader of the considerable extra effort of having to remember the default behaviour[19]. Or, more commonly, of having to look up the default behaviour every time.

It's always best practice to say precisely what you mean, and to record as much of your intention as possible in the actual source code—even if saying what you mean makes the code a little more verbose.

Barewords

Don't use barewords.

In Perl, any identifier that the compiler doesn't recognize as a subroutine (or as a package name or filehandle or label or built-in function) is treated as an unquoted character string. For example:

$greeting = Hello . World;
print $greeting, "\n";                # Prints: HelloWorld

Barewords are fraught with peril. They're inherently ambiguous, because their meaning can be changed by the introduction or removal of seemingly unrelated code. In the previous example, a Hello( ) subroutine might somehow come to be defined before the assignment, perhaps when a new version of some module started to export that subroutine by default. If that were to happen, the former Hello bareword would silently become a zero-argument Hello( ) subroutine call.

Even without such pre-emptive predeclarations, barewords are unreliable. If someone refactored the previous example into a single print statement:

print Hello, World, "\n";

then you'd suddenly get a compile-time error:

No comma allowed after filehandle at demo.pl line 1

That's because Perl always treats the first bareword after a print as a named filehandle[20], rather than as a bareword string value to be printed.

Barewords can also crop up accidentally, like this:

my @sqrt = map {sqrt $_} 0..100;
for my $N (2,3,5,8,13,21,34,55) {
    print $sqrt[N], "\n";
}

And your brain will "helpfully" gloss over the critical difference between $sqrt[$N] and $sqrt[N]. The latter is really $sqrt['N'], which in turn becomes $sqrt[0] in the numeric context of the array index; unless, of course, there's a sub N( ) already defined, in which case anything might happen.

All in all, barewords are far too error-prone to be relied upon. So don't use them at all. The easiest way to accomplish that is to put a use strict qw( subs ), or just a use strict, at the start of any source file (see Chapter 18). The strict pragma will then detect any barewords at compile time:

use strict 'subs';

my @sqrt = map {sqrt $_} 0..100;
for my $N (2,3,5,8,13,21,34,55) {
    print $sqrt[N], "\n";}

and throw an exception:

Bareword "N" not allowed while "strict subs" in use at sqrts.pl line 5

Fat Commas

Reserve => for pairs.

Whenever you are creating a list of key/value or name/value pairs, use the "fat comma" (=>) to connect the keys to their corresponding values. For example, use it when constructing a hash:

%default_service_record  = (
    name   => '<unknown>',
    rank   => 'Recruit',
    serial => undef,
    unit   => ['Training platoon'],
    duty   => ['Basic training'],);

or when passing named arguments to a subroutine (see Chapter 9):

$text = format_text(src=>$raw_text,  margins=>[1,62], justify=>'left');

or when creating a constant:

Readonly my $ESCAPE_SEQ => "\N{DELETE}\N{ACKNOWLEDGE}\N{CANCEL}Z";

The fat comma visually reinforces the connection between the name and the following value. It also removes the need to quote the key string, as long as you use only valid Perl identifiers[21] as keys. Compare the readability of the previous examples with the following comma-only versions:

%default_service_record  = (
    'name',   '<unknown>',
    'rank',   'Recruit',
    'serial', undef,
    'unit',   ['Training platoon'],
    'duty',   ['Basic training'],
);

$text = format_text('src', $raw_text, 'margins', [1,62], 'justify', 'left');

Readonly my $ESCAPE_SEQ, "\N{DELETE}\N{ACKNOWLEDGE}\N{CANCEL}Z";

An alternative criterion that is sometimes used when considering a => is whether you can pronounce the symbol as some kind of process verb, such as "becomes" or "produces" or "implies" or "goes into" or "is sent to". For example:

# The substring of $name becomes whatever's in $new_name
substr $name, $from, $len => $new_name;

# Send this signal to this process
send_signal($signal => $process);

# Open a handle to a particular file
open my $binary => '<:raw', $filename
    or croak "Can't open '$filename': $OS_ERROR";

Underlying this approach is the idea of using the prominence of the fat comma to mark the boundary of two distinct subsets within an argument list. At the same time, the arrow-like appearance of the operator is supposed to convey a sense of moving or changing or mapping values. The problem here is that it's far too easy to misinterpret the direction and destination of the "movement" being represented. For example:

# The substring of $name GOES INTO $new_name (No it doesn't!)
substr $name, $from, $len => $new_name;

# Open a handle GOING OUT TO a particular file (No it won't!)
open my $binary => $filename;

Moreover, the original pronunciation-based criterion for using a fat comma can easily be forgotten. Thereafter, the => is likely to be used indiscriminately, often counter-intuitively, and occasionally as a kind of "wish-fulfillment operator":

# This may or may not send the signal to the process
# (depending on the order in which send_msg( ) expects its arguments)
send_msg($signal => $process);

# This doesn't find the index of the target in the text (it's vice versa)
$found_at = index $target => $text;

# An excellent money-making plan ... for the casino
push @casino_money => @my_wallet;

Considering the potential for confusion, it's better to reserve the fat comma exclusively for hash entries, named arguments, and other name/value pairs.

Thin Commas

Don't use commas to sequence statements.

Perl programmers from a C/C++ background are used to writing C-style for loops in Perl:

# Binary chop search...
SEARCH:
for ($min=0,$max=$#samples, $found_target=0; $min<=$max; ) {
    $pos = int(($max+$min)/2);
    my $test_val = $samples[$pos];

    if ($target == $test_val) {
        $found_target = 1;
        last SEARCH;
    }
    elsif ($target < $test_val) {
        $max = $pos-1;
    }
    else {
        $min = $pos+1;
    }
}

Each comma within the for initialization acts as a kind of "junior semicolon", separating substatements within the first compartment of the for.

After seeing commas used that way, people sometimes think that it's also possible to use "junior semicolons" within a list:

print 'Sir ',
      (check_name($name), $name),
      ', KBE';

The intent seems to be to check the person's name just before it's printed, with check_name( ) throwing an exception if the name is wrong (see Chapter 13). The underlying assumption is that using a comma would mean that only the final value in the parentheses was passed on to print.

Unfortunately, that's not what happens. The comma actually has two distinct roles in Perl. In a scalar context, it is (as those former C programmers expect) a sequencing operator: "do this, then do that". But in a list context, such as the argument list of a print, the comma is a list separator, not technically an operator at all.

The subexpression (check_name($name), $name) is merely a sublist. And a list context automatically flattens any sublists into the main list. That means that the previous example is the same as:

print 'Sir ',
      check_name($name),
      $name,
      ', KBE';

which will probably not produce the desired effect:

Sir 1Tim Berners-Lee, KBE

The best way to avoid such problems is to adopt a style that limits commas to a single role: that of separating the items of lists. Then there can be no confusion between scalar comma operators and list comma separators.

If two or more statements need to be treated as a single statement, don't use scalar commas as "junior semicolons". Instead, use a do block and real semicolons:

# Binary chop search...
SEARCH:
for (do{$min=0; $max=$#samples;  $found_target=0;}; $min<=$max; ) {
    # etc, as before
}

print 'Sir ',
      do{ check_name($name); $name; },      ', KBE';

Or, better still, find a way to factor the sequence of statements out of the expression entirely:

($min, $max, $found_target) = (0, $#samples, 0);

SEARCH:
while ($min<=$max) {
    # [Binary chop implementation as shown earlier]
}

check_name($name);print "Sir $name, KBE";

Low-Precedence Operators

Don't mix high- and low-precedence booleans.

Perl's low-precedence logical not reads much better than its corresponding high-precedence ! operator. So it's tempting to write:

next CLIENT if not $finished;    # Much nicer than: if !$finished

However, the extremely low precedence of not can lead to problems if that condition is later extended:

next CLIENT if not $finished || $result < $MIN_ACCEPTABLE;

It's likely that at least some readers of your code will mistake the behaviour of that statement and assume that it's equivalent to:

next CLIENT if (not $finished) || $result < $MIN_ACCEPTABLE;

It's not. It actually means:

next CLIENT if not( $finished || $result < $MIN_ACCEPTABLE );

Even if the choice of || was deliberate, and implements the desired test correctly, there is nothing in the code to indicate that the mixing of precedence was intentional. So, while the novice reader is left to wonder about the meaning of the expression, the more experienced reader is left to wonder about its correctness.

Replacing the || with an or would solve the precedence problem (if indeed there were one), since or is even lower precedence than not:

next CLIENT if not $finished or $result < $MIN_ACCEPTABLE;

And then adding a pair of parentheses would explicitly indicate whether the intention was:

next CLIENT if not($finished or $result < $MIN_ACCEPTABLE);

or:

next CLIENT if not($finished) or $result < $MIN_ACCEPTABLE;

On the other hand, the high-precedence boolean operators don't seem to invoke the same levels of fear, uncertainty, or doubt, probably because they're used much more frequently. It's safer and more comprehensible to use only high-precedence booleans in conditional expressions:

next CLIENT if !$finished || $result < $MIN_ACCEPTABLE;

and then use parentheses when you need to vary precedence:

next CLIENT if !( $finished || $result < $MIN_ACCEPTABLE);

To maximize the comprehensibility of conditional tests, avoid and and not completely, and reserve low-precedence or for specifying "fallback positions" on fallible builtins:

open my $source, '<', $source_file
    or croak "Couldn't access source code: $OS_ERROR";

(but see also "Builtin Failures" in Chapter 13).

Lists

Parenthesize every raw list.

The precedence of the comma operator is so low that, even when it's in a list context, it may not act the way that a casual reader expects. For example, the following assignment:

@todo = 'Patent concept of 1 and 0', 'Sue Microsoft and IBM', 'Profit!';

is identical to:

@todo = 'Patent concept of 1 and 0';
'Sue Microsoft and IBM';
'Profit!';

That's because the precedence of the comma is less than that of assignment, so the previous example is really a set of "junior semicolons":

(@todo = 'Patent concept of 1 and 0'), 'Sue Microsoft and IBM', 'Profit!';

For that reason it's a good practice to ensure that comma-separated lists of values are always safely enclosed in parentheses, to boost the precedence of the comma-separators appropriately:

@todo = ('Patent concept of 1 and 0', 'Sue Microsoft and IBM', 'Profit!');

But be careful to avoid the all-too-common error of using square brackets instead of parentheses:

@todo = ['Patent concept of 1 and 0', 'Sue Microsoft and IBM', 'Profit!'];

This example produces a @todo array with only a single element, which is a reference to an anonymous array containing the three strings.

List Membership

Use table-lookup to test for membership in lists of strings; use any( ) for membership of lists of anything else.

Like grep, the any( ) function from List::MoreUtils (see "Utilities" in Chapter 8) takes a block of code followed by a list of values. Like grep, it applies the code block to each value in turn, passing them as $_. But, unlike grep, any( ) returns a true value as soon as any of the values causes its test block to succeed. If none of the values ever makes the block true, any( ) returns false.

This behaviour makes any( ) an efficient general solution for testing list membership, because you can put any kind of equivalence test in the block. For example:

# Is the index number already taken?
if ( any { $requested_slot == $_ } @allocated_slots ) {
    print "Slot $requested_slot is already taken. Please select another: ";
    redo GET_SLOT;}

or:

# Is the bad guy at the party under an assumed name?
if ( any { $fugitive->also_known_as($_) } @guests ) {
    stay_calm( );
    dial(911);
    do_not_approach($fugitive);}

But don't use any( ) if your list membership test uses eq:

Readonly my @EXIT_WORDS => qw(
    q  quit  bye  exit  stop  done  last  finish  aurevoir
);

# and later...

if ( any { $cmd eq $_ } @EXIT_WORDS ) {
    abort_run( );
}

In such cases it's much better to use a look-up table instead:

Readonly my %IS_EXIT_WORD
    => map { ($_ => 1) } qw(
           q  quit  bye  exit  stop  done  last  finish  aurevoir
       );

# and later...

if ( $IS_EXIT_WORD{$cmd} ) {
    abort_run( );}

The hash access is faster than a linear search through an array, even if that search can short-circuit. The code implementing the test is far more readable as well.



[18] Note that "interpolation" includes the expansion of character escapes like "\n" and "\t".

[19] Which happens to be: interpolate like a double-quoted string.

[20] Technically speaking, it treats the bareword as a symbolic reference to the current package's symbol table entry, from which the print then extracts the corresponding filehandle object.

[21] A valid Perl identifier is an alphabetic character or underscore, optionally followed by one or more alphanumeric characters or underscores.

Get Perl Best Practices now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.