8. Multiple Form Interaction

One of the problems with the current HTTP protocol is its inability to maintain state. In other words, the protocol provides no way to access data from previous requests.

Imagine an ordering (or “shopping cart”) system on the Web. You present the user with several forms listing the numerous products that can be ordered. The system keeps track of what the user ordered. Finally, it displays all of the user's selections. This type of system needs to somehow store the information--or “state”--so that it can be accessed at a later time.

For example, suppose you ask the user for his or her address in the first form. If you need this information in a later form, you don't want to ask all over again. Instead, you want to find a way for that address to be accessible to a later form, but transparent to the user. This is the most basic problem of using multiple forms--maintaining “state” from one form to another--and thus deserves special attention in this book.

There are several different strategies we'll explore for maintaining state. They include:

  • Hidden fields. Using hidden fields, you can embed information into a form that the user won't see, but which will be sent back to the CGI program when the form is submitted.
  • CGI Side Includes. This is a mechanism by which we embed special tags into the HTML document that pass CGI variables invisibly.
  • Netscape Persistent Cookies. The Netscape browser supplies a method for storing and retrieving information via CGI.

In Chapter 10, Gateways to Internet Information Servers, we also discuss a fourth approach, which is to develop a specialized “cookie server” to maintain information associated with a single user. In this chapter, however, we'll restrict ourselves to the more straightforward mechanisms.

8.1 Hidden Fields

As mentioned in Chapter 4, Forms and CGI, hidden fields allow you to store “hidden” information within a form. These fields are not displayed by the client. However, if the user selects the “View Source” option in the browser, the entire form is visible, including the hidden fields. Hidden fields are therefore not meant for security (since anyone can see them), but just for passing information to and from forms transparently.

Here is an example of two hidden fields that store author information within a form:

<FORM ACTION="/cgi-bin/test.pl" METHOD="POST">
.
.
<INPUT TYPE="hidden" NAME="author"  VALUE="Larry Bird">
<INPUT TYPE="hidden" NAME="company" VALUE="Boston Celtics">
.
.
</FORM>

When the form is submitted, the information within the hidden fields is encoded, as the client passes all the fields to the server in the same exact manner. As far as the CGI program is concerned, there is no difference between hidden fields and regular, visible fields.

One thing to note is that certain browsers may not be able to handle hidden fields correctly.

AOA simple way to use hidden fields for maintaining state involves writing the information from a form as hidden field information into its successive form. Here is a simple first form:

<FORM ACTION="/cgi-bin/test.pl" METHOD="POST">
Name: <INPUT TYPE="text"  NAME="01 Full Name" SIZE=40>
<BR>
EMail: <INPUT TYPE="text" NAME="02 EMail" SIZE=40>
<BR>
<INPUT TYPE="submit" VALUE="Submit the survey">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>

When this form is submitted, the program retrieves the information and creates a dynamic second form, based on the first form, like this:

<FORM ACTION="/cgi-bin/test.pl" METHOD="POST">
<INPUT TYPE="hidden" NAME="01 Full Name" VALUE="Shishir Gundavaram">
<INPUT TYPE="hidden" NAME="02 EMail" VALUE="shishir@acs.bu.edu">
What is your favorite WWW browser?
<BR>
Browser: <INPUT TYPE="text" NAME="03 Browser" SIZE=40>
<BR>
<INPUT TYPE="submit" VALUE="Submit the survey">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>

As you can see, the two fields, along with the user information, are inserted into the second form. The main advantage of such a process is that there is no need for magic cookies and temporary files. On the other hand, the disadvantage is that the form information is appended repeatedly to successive forms, creating large forms. This could result in possible performance problems.

Let's look at an example using this technique. Here is the first form:

<HTML>
<HEAD><TITLE>Welcome to the CGI Shopping Cart</TITLE></HEAD>
<BODY>
<H1>CGI Shopping Cart</H1>
Welcome! Thanks for stopping by the CGI Shopping Cart. Here is a list
of some of our products. We hope you like them, and please visit again.
<FORM ACTION="/cgi-bin/shopping.pl/catalog.html" METHOD="POST">
<HR>
What is your full name: <BR>
<INPUT TYPE="text" NAME="01 Full Name" SIZE=40>
<P>
What is your e-mail address: <BR>
<INPUT TYPE="text" NAME="02 Email" SIZE=40>
<P>
<INPUT TYPE="submit" VALUE="Submit and Retrieve Catalog">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
</BODY></HTML>

The most important thing to note here is the extra path information passed to the program. This filename represents the next form to be displayed. The two fields in this form will be “hidden” in /catalog.html. Now, here is the second form:

<HTML>
<HEAD><TITLE>Welcome to the CGI Shopping Cart</TITLE></HEAD>
<BODY>
<H1>CGI Shopping Cart</H1>
Thanks for visiting our server. Here is a catalog of some of our books.
Make your selections and press the submit buttons. Note: multiple
selections are allowed.
<HR>
<FORM ACTION="/cgi-bin/shopping.pl" METHOD="POST">
<H2>Books on Networking</H2>
<SELECT NAME="03 Networking Books" SIZE=3 MULTIPLE>
<OPTION SELECTED>Managing Internet Information Services
<OPTION>TCP/IP Network Administration
<OPTION>Linux Network Administrator's Guide
<OPTION>Managing UUCP and Usenet
<OPTION>The USENET Handbook
</SELECT>
<HR>
<H2>UNIX related Books</H2>
<SELECT NAME="04 UNIX Books" SIZE=3 MULTIPLE>
<OPTION SELECTED>Learning the UNIX Operating System
<OPTION>Learning the Korn Shell
<OPTION>UNIX Power Tools
<OPTION>Learning Perl
<OPTION>Programming Perl
<OPTION>Learning the GNU Emacs
</SELECT>
<INPUT TYPE="submit" VALUE="Submit the selection">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
</BODY></HTML>

The ACTION attribute does not contain extra path information. This represents the last form in the “shopping cart.” Also note the fact that there is a scrolled list that allows multiple selections. The program displays any form element that has multiple selection in a unique way.

The program begins as follows:

#!/usr/local/bin/perl
$webmaster = "shishir\@bu\.edu";
$document_root = "/home/shishir/httpd_1.4.2/public";
$request_method = $ENV{'REQUEST_METHOD'};
$form_file = $ENV{'PATH_INFO'};
$full_path = $document_root . $form_file;
$exclusive_lock = 2;
$unlock = 8;
if ($request_method eq "GET") {
    if ($form_file) {
        &display_file ();
    } else {
        &return_error (500, "CGI Shopping Cart Error",
                            "An initial form must be specified.");
    }

If the program was requested with the GET protocol and extra path information, the display_file subroutine is called to output the form. The program should be accessed with the following URL:

http://your.machine/cgi-bin/shopping.pl/start.html

where /start.html represents the first form. If no path information is specified, an error message is returned.

} elsif ($request_method eq "POST") {
    &parse_form_data (*STATE);
    if ($form_file) {
        &parse_file ();
    } else {
        &thank_you ();
    }

If extra path information is passed to this program with the POST method, the parse_file subroutine is invoked. This subroutine inserts the information from the previous form(s) into the current form as hidden fields. Remember, the form information is stored in the STATE associative array. On the other hand, if no path information is specified, it is the end of the data collection process. The thank_you subroutine displays the information from all the forms.

} else {
    &return_error (500, "Server Error",
                        "Server uses unsupported method");
}
exit (0);

The display_file subroutine simply outputs the first form to standard output.

sub display_file
{
    open (FILE, "<" . $full_path) ||
        &return_error (500, "CGI Shopping Cart Error",
            "Cannot read from the form file [$full_path].");
    flock (FILE, $exclusive_lock);
    print "Content-type: text/html", "\n\n";
    while (<FILE>) {
        print;
    }
    flock (FILE, $unlock);
    close (FILE);
}

The parse_file subroutine inserts information from previous forms into the current form, as hidden fields.

sub parse_file
{
    local ($key, $value);
    open (FILE, "<" . $full_path) ||
        &return_error (500, "CGI Shopping Cart Error",
            "Cannot read from the form file [$full_path].");
    flock (FILE, $exclusive_lock);
    print "Content-type: text/html", "\n\n";
    while (<FILE>) {
        if (/<\s*form\s*.*>/i) {
            print;
            foreach $key (sort (keys %STATE)) {
                $value = $STATE{$key};
                print <<End_of_Hidden;
<INPUT TYPE="hidden" NAME="$key" VALUE="$value">
End_of_Hidden
            }

The file specified by PATH_INFO is opened. The while loop iterates through the file one line at a time. The regular expression checks for the <FORM> tag within the document. If it is found, the line containing the tag is displayed. Also, the foreach construct iterates through all of the key-value form pairs, and outputs a hidden field for each one.

} else {
            print;
        }
    }

If the <FORM> tag is not found, the line from the file is output verbatim.

flock (FILE, $unlock);
    close (FILE);
}

The thank_you subroutine thanks the user and displays the data he or she selected.

sub thank_you
{
    local ($key, $value, @all_values);
    print <<Thanks;
Content-type: text/html
<HTML>
<HEAD><TITLE>Thank You!</TITLE></HEAD>
<BODY>
<H1>Thank You!</H1>
Thank you again for using our service. Here are the items
that you selected:
<HR>
<P>
Thanks

This subroutine formats and displays the information stored in the STATE associative array, which represents the combined data from all the forms.

foreach $key (sort (keys %STATE)) {
        $value = $STATE{$key};
        $key =~ s/^\d+\s//;
        if ($value =~ /\0/) {
            print "<B>", $key, "</B>", "<BR>", "\n";
            $value =~ s/\0/<BR>\n/g;
            print $value, "<BR>", "\n";

If a particular value contains a null string, it is replaced with “<BR>” followed by a newline character. As a result, the multiple values are displayed properly.

} else {
            print $key, ": ", $value, "<BR>", "\n";
        }
    }
    print "<HR>", "\n";
    print "</BODY></HTML>", "\n";
}

The parse_form_data subroutine is similar to the one used in the “survey” program above, except it does not handle any query information.

sub parse_form_data
{
    local (*FORM_DATA) = @_;

    local ($query_string, @key_value_pairs, $key_value, $key, $value);
read (STDIN, $query_string, $ENV{'CONTENT_LENGTH'});
    @key_value_pairs = split (/&/, $query_string);
    foreach $key_value (@key_value_pairs) {
        ($key, $value) = split (/=/, $key_value);
        $key   =~ tr/+/ /;
        $value =~ tr/+/ /;
        $key   =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;
        $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;
        if (defined($FORM_DATA{$key})) {
            $FORM_DATA{$key} = join ("\0", $FORM_DATA{$key}, $value);
        } else {
            $FORM_DATA{$key} = $value;
        }
    }
}

8.2 CGI Side Includes

Using hidden fields is probably the simplest way to maintain information across multiple CGI instances. But it is far from the most efficient.

In this next example of maintaining state, we embed special codes into HTML documents that resemble Server Side Includes (see Chapter 5, Server Side Includes, for more information on Server Side Includes). These codes are actually parsed by a CGI program which uses the codes to maintain information across several documents. This algorithim is best illustrated via example.

Let's create a multiple survey form system. Here is the first form of the survey:

<HTML>
<HEAD><TITLE>Television/Movie Survey</TITLE></HEAD>
<BODY>
<H1>Welcome to the CGI Network!</H1>
<HR>
In order to better serve you, we would like to know what type of
movies and variety shows you like to watch on TV. Over the last couple
of years, you, the viewers, were directly responsible for the lasting
success of many of our shows. Your comments are extremely valuable to
us, so please take a few moments to fill out a survey.
<P>
The current time is: <!--#insert var="DATE_TIME"--><BR>

At first glance, the construct in the last line displayed above looks like a Server Side Include. However, it is not! This document first gets parsed by a CGI program that looks for statements like these and replaces them with appropriate information. Let's refer to these statements as CGI Side Includes ( CSIs), or “pseudo” Server Side Includes. In this case, the program will insert the current date and time.

You may ask, what is the advantage of such a process? It allows you to insert dynamic information in otherwise static documents. Another alternative to this would be to place the information contained within the document in the program, such as:

print <<End_of_Form;
<HTML>
<HEAD><TITLE>Sample Form</TITLE></HEAD>
<BODY>
<H1>This is a test of a sample form</H1>
The current time is: $date_time
<HR>
.
.
.
</BODY></HTML>
End_of_Form

As you can see, this can be quite cumbersome, especially if the document is large. Now, let's proceed with the rest of the form.

<HR>
<FORM ACTION="/cgi-bin/survey.pl?
                 cgi_cookie=<!--#insert var="COOKIE"-->&
                 cgi_form_num=<!--#insert var="NUMBER"-->" METHOD="POST">

As in other examples in this book, a query is passed to the program as part of the ACTION attribute. Notice the two CSI statements in the <FORM> tag. The first one inserts a random number--also referred to as a magic cookie--for identification purposes, and the second one inserts the form number. A cookie is needed to store the information from the various forms in a unique data file. This cookie is passed to each and every form, so that the form data is appended to the same data file. A form number is needed to keep track of the various forms. We will discuss these statements in detail later in this chapter.

<PRE>
Full Name: <INPUT TYPE="text" NAME="01 Full Name" SIZE=40>
E-Mail:    <INPUT TYPE="text" NAME="02 EMail Address" SIZE=40>

The field names are prefixed with numbers, so that they can be sorted. This makes it possible to store the form data in the order in which it is displayed in the form. Remember, you do not need to encode the field names, as the browser will do so before it submits the information to the server.

</PRE>
<P>
Which survey would you like to fill out: <BR>
<INPUT TYPE="radio" NAME="cgi_survey" VALUE="Television" CHECKED>Television<BR>
<INPUT TYPE="radio" NAME="cgi_survey" VALUE="Movie">Movies<BR>
<P>
<INPUT TYPE="submit" VALUE="Submit the survey">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
<HR>
</BODY></HTML>

The document is passed to the CGI program as extra path information. For example, if you want the program to parse the CSI statements and display the form, the following URL should be used:

http://your.machine/survey.pl/start_survey.html

where the file “/start_survey.html” contains the first form of the survey. In the context of this example, if the user opts to fill out the “Television” survey, the following two forms are displayed, one after the other:

<HTML>
<HEAD><TITLE>Television/Movie Survey</TITLE></HEAD>
<BODY>
<H1>Television Survey</H1>
<HR>
Welcome! We are glad that you have decided to fill out our
television survey. Please read all questions carefully. When you are finished,
press the Submit button for Part 2 of the survey.
<P>
The current time is: <!--#insert var="DATE_TIME"--><BR>

The date and time are inserted into the form using CGI side includes.

<HR>
<FORM ACTION="/cgi-bin/survey.pl?cgi_cookie=<!--#insert var="COOKIE"-->&cgi_survey=<!--#insert var="SURVEY"-->&cgi_form_num=<!--#insert var="NUMBER"-->” METHOD="POST">

The variable “SURVEY” inserts the user-selected survey type, either “Television” or “Movie.” The survey type is retrieved from the information submitted by the user in the first form. This ensures that the correct series of forms are displayed.

What is your favorite comedy show?
<BR>
<INPUT TYPE="radio" NAME="03 Comedy Show" VALUE="Single Web Dude">Single Web Dude<BR>
<INPUT TYPE="radio" NAME="03 Comedy Show" VALUE="Gateway Friends">Gateway Friends<BR>
<INPUT TYPE="radio" NAME="03 Comedy Show" VALUE="Mad About CGI" CHECKED>Mad About CGI<BR>
<INPUT TYPE="radio" NAME="03 Comedy Show" VALUE="Web Time">Web Time<BR>
<P>
Who is your favorite actor in a comedy show?
<BR>
<INPUT TYPE="radio" NAME="04 TV Comedian" VALUE="John Riser" CHECKED>John Riser<BR>
<INPUT TYPE="radio" NAME="04 TV Comedian" VALUE="Jake LeBlanc">Jake LeBlanc<BR>
<INPUT TYPE="radio" NAME="04 TV Comedian" VALUE="Mike Cosby">Mike Cosby<BR>
<INPUT TYPE="radio" NAME="04 TV Comedian" VALUE="Marc Allen">Marc Allen<BR>
<P>
<INPUT TYPE="submit" VALUE="Submit the survey">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
<HR>
</BODY></HTML>

The field names are prefixed with numerical values. Notice the long, descriptive names for the field names and values. This allows us to simply retrieve the names and values, decode them, and print them out.

Now, here is the second, and final, form in the “Television” survey:

<HTML>
<HEAD><TITLE>Television/Movie Survey</TITLE></HEAD>
<BODY>
<H1>Televison Survey</H1>
<HR>
Thanks for filling out Part 1 of our TV survey. Here is
Part 2... Again, please read all questions carefully. When you are finished,
press the Submit button to wrap up the survey.
<P>
The current time is: <!--#insert var="DATE_TIME"--><BR>
<HR>
<FORM ACTION="/cgi-bin/survey.pl?cgi_cookie=<!--#insert var="COOKIE"-->&cgi_survey=<!--#insert var="SURVEY"-->&cgi_form_num=<!--#insert var="NUMBER"-->" METHOD="POST">
What is your favorite action/drama show?
<BR>
<INPUT TYPE="radio" NAME="05 TV Drama" VALUE="Masquerade on the Web">Masquerade on the Web<BR>
<INPUT TYPE="radio" NAME="05 TV Drama" VALUE="Gateway Voyager">Gateway Voyager<BR>
<INPUT TYPE="radio" NAME="05 TV Drama" VALUE="EH" CHECKED>EH - Emergency HTTP Server<BR>
<INPUT TYPE="radio" NAME="05 TV Drama" VALUE="W3C Hope">W3C Hope<BR>
<P>
Who is your favorite actor in an action/drama show?
<BR>
<INPUT TYPE="radio" NAME="06 TV Drama Actor" VALUE="Bill Wyle" CHECKED>Bill Wyle<BR>
<INPUT TYPE="radio" NAME="06 TV Drama Actor" VALUE="John Clooney">John Clooney<BR>
<INPUT TYPE="radio" NAME="06 TV Drama Actor" VALUE="Mike Strauss">Mike Strauss<BR>
<INPUT TYPE="radio" NAME="06 TV Drama Actor" VALUE="Eric Wagner">Eric Wagner<BR>
<P>
<INPUT TYPE="submit" VALUE="Submit the survey">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
<HR>
</BODY></HTML>

The two forms for the “Movie” survey are set up in the same manner as the ones illustrated above. Let's look at the program:

#!/usr/local/bin/perl
$exclusive_lock = 2;
$unlock = 8;
$request_method = $ENV{'REQUEST_METHOD'};
$webmaster = "shishir\@bu\.edu";
$document_root = "/home/shishir/httpd_1.4.2/public";
$survey_dir = "/tmp/";

The variable survey_dir contains the directory where the data files are stored. Whenever you are creating temporary files, you should store them in /tmp or /var/tmp, as these directories are cleaned out every few days.

@Television_files = ( "/tv_1.html", "/tv_2.html" );
@Movie_files = ( "/movie_1.html", "/movie_2.html" );

These two arrays store the HTML survey files that must be parsed for CSI statements. The most important thing to note here is the way the variables are labeled. The first part of the variable name--before the “_” character--corresponds to the value of the cgi_survey field in the initial form. The program determines the survey type chosen by the user--either “Television” or “Movie”--and concatenates that string with “_files” and evaluates the total string at run-time to determine the next survey file.

if ($request_method eq "GET") {
    $form_num = 0;
    $type = "start";
    $form_file = $ENV{'PATH_INFO'};

Using the GET method indicates that the user requested the starting form, which will be stored in PATH_INFO. The form_num variable indicates the current form number. In this case, zero indicates the starting form.

The type variable is set to “start”. However, this value is never used because there is no corresponding CSI in the initial form. It is just defined for clarity. Remember, the manner in which the starting form must be accessed is a GET request:

http://your.machine/cgi-bin/survey.pl/start_survey.html

After the first form is submitted, the server will execute this program with a POST request and an additional query. The process is repeated for all the forms in the survey.

if ($form_file) {
        $cookie = join ("_", $ENV{'REMOTE_HOST'}, time);
        $cookie = &escape($cookie);
        &pseudo_ssi ($form_file, $cookie, $type, $form_num);
    } else {
        &return_error (500, "CGI Network Survey Error",
                        "An initial survey form must be specified.");
    }

Since the starting form was accessed, a new cookie has to be created. This cookie is simply the client's host address concatenated with the current time. Perl's time command returns the current time as the number of seconds since 1970. This ensures that every user has a different cookie.

The escape subroutine encodes the cookie string for insertion into the form. Finally, the pseudo_ssi subroutine reads and parses the file specified by the variable form_file for CSI statements. The three parameters that are passed to the subroutine are the new cookie, the dummy form type, and the form number. If corresponding CSI statements are found, the values stored in these variables will be inserted appropriately.

} elsif ($request_method eq "POST") {
    &parse_form_data(*STATE);
    $form_num = $STATE{'cgi_form_num'};
    $type = $STATE{'cgi_survey'};
    $cookie = $STATE{'cgi_cookie'};

The form information is retrieved and stored in the STATE associative array. The parse_form_data subroutine is slightly different than the one used in the previous examples; it decodes the form field name, as well as the value.

Once the initial form is submitted, form_num variable equals zero, type contains either “Television” or “Movie,” and cookie holds a string that uniquely identifies a user. After the initial form, all the other forms will have the same cookie and type information. However, the form_num variable will be incremented.

if ( ($type eq "Television") || ($type eq "Movie") ) {

This conditional is executed if the user chose to fill out either a television or movie survey. Since one of the values is checked by default on the form, this variable will have to contain either “Television” or “Movie.” However, if someone accesses this program by bypassing the starting form, and specifies something other than these two values, an error message is displayed.

$limit = eval ("scalar (\@${type}_files)");

This run-time evaluation is very important. It uses Perl's scalar function to determine the number of elements in the array that corresponds to the value stored in the variable type. Here is a simple example of scalar:

@test = (1, 2, 3);
$number = scalar (@test);

The variable number returns 3 to indicate the existence of three elements.

if ( ($form_num >= 0) && ($form_num <= $limit) ) {
            &write_data_to_file();

If the form number is within the limits, the write_data_file subroutine is called to write the form information to a data file. Remember, the same data file is used throughout the whole process. On the other hand, if a user bypasses the forms, and tries to pass a form number that is not within the limits, an error message is displayed.

if ($form_num == $limit) {
                &survey_over();

If the form is the last one in the survey, the survey_over subroutine is called to display the information stored in the data file. It also deletes the data file.

} else {
          $form_file = eval("\$${type}_files[$form_num]");
          $form_num++;
          $cookie = &escape($cookie);
          &pseudo_ssi ($form_file, $cookie, $type,
                       $form_num);
 }

Again, a run-time evaluation is performed to retrieve the name of the next file in the survey. If these two run-time evals were not used, then two separate blocks of code have to be written: one to handle the television survey, and the other to handle the movie survey. It is more much efficient to do it this way.

The form number is incremented, and the cookie value is encoded. The subroutine pseudo_ssi is called to parse the form file.

} else {
                &return_error (500, "CGI Network Survey Error",
                    "You have somehow selected an invalid form!");
        }
    } else {
        &return_error (500, "CGI Network Survey Error",
                "You have selected an invalid survey type!");
    }
} else {
    &return_error (500, "Server Error",
                        "Server uses unsupported method");
}
exit(0);

If the user somehow passed invalid information to the program, error messages are returned.

Now for the subroutines. The pseudo_ssi subroutine parses the CSI statements.

sub pseudo_ssi
{
    local ($file, $id, $kind, $number) = @_;
    local ($command, $argument, $parameter, $line);
    $file = $document_root . $file;
    open (FILE, "<" . $file) ||
        &return_error (500, "CGI Network Survey Error",
            "Cannot open: form [$number], file [$file].");
    flock (FILE, $exclusive_lock);

The subroutine tries to open the specified file. An error message is returned if the operation fails.

print "Content-type: text/html", "\n\n";
    while (<FILE>) {
        while ( ($command, $argument, $parameter) =
            (/<!--\s*#\s*(\w+)\s+(\w+)\s*=\s*"?(\w+)"?\s*-->/io) ) {

The initial loop iterates through each line in the file, and stores it in the default variable $_. The second loop uses a regular expression to check for a CSI statement within the file. Here is the format for the CSI statement:

<!--#command argument="parameter"-->

Whitespace is ignored, and the quotation marks around the parameter are optional. This is in great contrast to SSI statements, where a strict format is enforced.

if ($command eq "insert") {
                if ($argument eq "var") {
                    if ($parameter eq "COOKIE") {
                        s//$id/;
                    } elsif ($parameter eq "DATE_TIME") {
                        local ($time) = &get_date_time();
                        s//$time/;
                    } elsif ($parameter eq "NUMBER") {
                        s//$number/;
                    } elsif ($parameter eq "SURVEY") {
                        s//$kind/;
                    } else {
                        s///;
                    }
                } else {
                    s///;
                }
            } else {
                s///;
            }
        }

        print;
    }

This block might look very confusing, but it is quite simple. This program only supports the insert command and the var argument. However, four parameters are allowed: COOKIE, DATE_TIME, NUMBER, and SURVEY.

Notice the strange substitute command. The initial string to substitute is not specified. Usually, the format of the substitute command looks like this:

s/initial/replacement/;

Perl will work on the default variable $_. However, if no initial string is specified, Perl automatically uses the last matched regular expression. This just so happens to be the CSI statement that matched earlier. This is a good trick in Perl, because it is very efficient.

The subroutine simply checks to see the parameter of the CSI, and replaces the information appropriately. The get_date_time subroutine is the same as the one used previously. If the command, argument, or parameter specified in the file does not match the ones listed, the substitute command is used to remove the CSI statement. Note the following format:

s///;

Perl replaces the last matched regular expression with a null string. It is very important to remove these unmatched CSI statements, or else the enclosing while loop will run forever. The reason for this is that the loop repeatedly checks for CSI statements.

Finally, the modified line is output. A print command without any parameters outputs the default variable $_.

flock (FILE, $unlock);
    close (FILE);
}

Before we quit the subroutine, the file is unlocked and closed.

The write_data_to_file subroutine opens the data file and incorporates the survey results into it.

sub write_data_to_file
{
    local ($key, $temp_key);
    open (FILE, ">>" . $survey_dir . $cookie) ||
                    &return_error (500, "CGI Network Survey Error",
                        "Cannot write to a data file to store your info.");
    if ($form_num == 0) {
        print FILE $STATE{'cgi_survey'}, " Survey Filled Out", "\n";
    }

The data file is opened in append mode. There is no need to lock the file, because every user has a unique filename. If the form number indicates that it is the initial form, a header is output.

foreach $key (sort (keys %STATE)) {

Let's look at this construct from the innermost parentheses. The keys command returns an array consisting of all the keys of the associative array. The sort function then sorts that array. And foreach iterates through this array, storing each element in key.

Information in an associative array is not stored in any order, because it is based on a string index. As a result, the keys command returns the information in a random order. Prefixing numerical values to the form field names allows us to sort the information returned by the keys command.

if ($key !~ /^cgi_/) {

If the key name begins with “cgi_”, it is omitted. Internally used variables are prefixed with “cgi_” to keep them separate from real form data.

($temp_key = $key) =~ s/^\d+\s//;

This regular expression is used to remove the numerical value from the key. The modified key is stored in temp_key. The field names in the form were in the format:

"01 Variable Name"

We use the regular expression to search for a string that starts with a numeric value followed by a space.

print FILE $temp_key, ": ", $STATE{$key}, "\n";
        }
    }
    close (FILE);
}

The new key, along with the form value, is displayed. If the form contained a scrolling list that allowed the user to make multiple selections, then all of the values are stored in one string, separated by the null character, “\0”. This subroutine does not perform any formatting on such a string. However, the next ordering system example shows how to split and display these values separately.

Note that the associative array is still indexed by the “old” key. The new key was defined just for output purposes. Finally, the file is closed.

The survey_over subroutine thanks the user and prints his or her responses.

sub survey_over
{
    local ($file) = $survey_dir . $cookie;
    open (FILE, "<" . $file) ||
                &return_error (500, "CGI Network Survey Error",
                                 "Cannot read the survey data file [$file].");
    print <<Thanks;
Content-type: text/html
<HTML>
<HEAD><TITLE>Thank You!</TITLE></HEAD>
<BODY>
<H1>Thank You!</H1>
Thank you again for filling out our survey. Here is the information
that you selected:
<HR>
<P>
Thanks
    while (<FILE>) {
        print $_, "<BR>";
    }
    print "<HR>";
    print "</BODY></HTML>", "\n";
    close (FILE);
    unlink ($file);
}

The file is opened in read mode, and the information contained in it is displayed to standard output. Finally, the unlink command deletes the file.

The escape subroutine encodes the data. The code is very similar to the program presented at the beginning of this book.

sub escape
{
    local ($string) = @_;
    $string =~ s/(\W)/sprintf("%%%x", ord($1))/eg;
    return($string);
}

Finally, the parse_form_data subroutine parses the form field name as well as the form data. That is the only difference between this version of the subroutine and the one presented in the earlier examples.

sub parse_form_data
{
    local (*FORM_DATA) = @_;

    local ($query_string, @key_value_pairs, $key_value, $key, $value);

    read (STDIN, $query_string, $ENV{'CONTENT_LENGTH'});
    if ($ENV{'QUERY_STRING'}) {
            $query_string = join("&", $query_string, $ENV{'QUERY_STRING'});
    }
    @key_value_pairs = split (/&/, $query_string);
    foreach $key_value (@key_value_pairs) {
        ($key, $value) = split (/=/, $key_value);
        $key   =~ tr/+/ /;
        $value =~ tr/+/ /;

        $key   =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;
        $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;
        if (defined($FORM_DATA{$key})) {
            $FORM_DATA{$key} = join ("\0", $FORM_DATA{$key}, $value);
        } else {
            $FORM_DATA{$key} = $value;
        }
    }
}

There are other ways to accomplish an ordering or “shopping cart” system like the one illustrated above. However, this is one of the best ways. The only drawback to this approach involves the temporary files that are created.

If a user decides to exit midway through the survey, the temporary file will not be deleted, because there is no way to determine when the user leaves. The only solution to this problem is to manually delete files based on modification times. See Chapter 9, Gateways, Databases, and Search/Index Utilities, for an ordering system that works by communicating with another network server, specially designed to store and distribute information.

CSI Statements and Hidden Fields

The hidden field technique we described earlier allows us to modify the ordering system presented earlier in two ways. The first is to replace the query information in the ACTION attribute of the <FORM> tag with hidden fields. Let's look at the starting form again:

<HTML>
<HEAD><TITLE>Television/Movie Survey</TITLE></HEAD>
<BODY>
<H1>Welcome to the CGI Network!</H1>
<HR>
In order to better serve you, we would like to know what type of
movies and variety shows you like to watch on TV. Over the last couple
of years, you, the viewers, were directly responsible for the lasting
success of many of our shows. Your comments are extremely valuable to
us, so please take a few moments to fill out a survey.
<P>
The current time is: <!--#insert var="DATE_TIME"--><BR>

If we want the current time to be displayed in the form, we need to keep this statement.

<HR>
<FORM ACTION="/cgi-bin/survey.pl?cgi_cookie=<!--#insert var="COOKIE"-->&cgi_form_num=" METHOD="POST">

This can be modified to:

<FORM ACTION="/cgi-bin/survey.pl" METHOD="POST">
<INPUT TYPE="hidden" NAME="cgi_cookie" VALUE="<!--#insert var="COOKIE"-->"
<INPUT TYPE="hidden" NAME="cgi_form_num" VALUE="<!--#insert var="NUMBER"-->"

The program described above will replace the CSI statements with appropriate information.

<PRE>
Full Name: <INPUT TYPE="text" NAME="01 Full Name" SIZE=40>
E-Mail:    <INPUT TYPE="text" NAME="02 EMail Address" SIZE=40>
</PRE>
<P>
Which survey would you like to fill out: <BR>
<INPUT TYPE="radio" NAME="cgi_survey" VALUE="Television" CHECKED>Television<BR>
<INPUT TYPE="radio" NAME="cgi_survey" VALUE="Movie">Movies<BR>
<P>
<INPUT TYPE="submit" VALUE="Submit the survey">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
<HR>
</BODY></HTML>

There is really no advantage to using this technique over the original one, as the two are nearly identical. If you use this method, you can remove the following line from the parse_form_data subroutine:

if ($ENV{'QUERY_STRING'}) {
            $query_string = join("&", $query_string, $ENV{'QUERY_STRING'});
    }

There is no need to store any query information.

8.3 Netscape Persistent Cookies

A third way of maintaining state is to use Netscape persistent cookies. One of the features of the Netscape Navigator browser is the capability to store information on the client side. It does this by accepting a new Set-Cookie header from CGI programs, and passing that information back using a HTTP_COOKIE environment variable. We won't show a complete example, but we'll illustrate briefly.

A program that stores the information on the client side might begin as follows:

#!/usr/local/bin/perl
($key, $value) = split(/=/, $ENV{'QUERY_STRING'});
print "Content-type: text/html", "\n";
print "Set-Cookie: $key=$value; expires=Sat, 26-Aug-95 15:45:30 GMT; path=/; domain=bu.edu", "\n\n";

The cookie header requires the key/value information to be encoded.

.
.
.
exit (0);

The Set-Cookie header sets one cookie on the client side, where a key is equal to a value. The expires attribute allows you to set an expiration date for the cookie. The path attribute specifies the subset of URLs that the cookie is valid for. In this case, the cookie is valid and can be retrieved by any program served from the document root hierarchy. Finally, the domain attribute sets the domain for which the cookie is valid. For example, say a cookie labeled “Parts” is set with a domain attribute of “bu.edu”. If the user accesses a URL in another domain that tries to retrieve the cookie “Parts,” it will be unable to do so. You can also use the attribute secure to instruct the browser to send a cookie only on a secure channel (e.g., Netscape's HTTPS server). All of these attributes are optional.

Now, how does a program access the stored cookies? When a certain document is accessed by the user, the browser will send the cookie information--provided that it is valid to do so--as the environment variable HTTP_COOKIE. For example, if the user requests a document for which the cookie is valid before the cookie expiration date, the following information might be stored in HTTP_COOKIE:

Full%20Name=Shishir%20Gundavaram; Specification=CGI%20Book

Cookies are separated from the next by the “ ; ” delimiter. To decode this information and place it into an associative array, we can use the following subroutine:

sub parse_client_cookies
{
    local (*COOKIE_DATA) = @_;

    local (@key_value_pairs, $key_value, $key, $value);
    @key_value_pairs = split (/;\s/, $ENV{'HTTP_COOKIE'});
    foreach $key_value (@key_value_pairs) {
        ($key, $value) = split (/=/, $key_value);
        $key   =~ tr/+/ /;
        $value =~ tr/+/ /;
        $key   =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;
        $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;
        if (defined($FORM_DATA{$key})) {
            $FORM_DATA{$key} = join ("\0", $FORM_DATA{$key}, $value);
        } else {
            $FORM_DATA{$key} = $value;
        }
    }
}

This subroutine is very similar to the one we have been using to decode form information. You can set more than one cookie at a time, for example:

print "Set-Cookie: Computer=SUN; path=/", "\n";
print "Set-Cookie: Computer=AIX; path=/images", "\n";

Now, if the user requests the URL in the path /images, HTTP_COOKIE will contain:

Computer=SUN; Computer=AIX

There are a couple of disadvantages with this client-side approach to storing information. First, the technique only works for Netscape Navigator browsers. Second, there are restrictions placed on the cookie size and number of cookies. The information contained in each cookie cannot exceed 4KB, and only 20 cookies are allowed per domain. A total of 300 cookies can be stored by each user.

Get CGI Programming on the World Wide Web now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.