Parsing the Data

Taking a look at exhibit.txt, we can see that it consists of individual company listings separated by blank lines. Within each company’s listing, the same sequence of lines occurs: the first holds the company name, the next holds the booth number, the next holds the street address, and so on. By splitting up the file wherever we see a blank line, we can isolate individual companies’ information. By counting lines within those sections, we should be well on our way to extracting the relevant data from the file. We can then use pattern-matching operators to help us identify the data contained in lines that otherwise would be ambiguous.

Example 5-3 shows our first version of make_exhibit.plx , the script that will do this parsing and HTML-page creation. It features several new Perl features you haven’t seen before, but not to worry; we’ll be going through them all one by one.

Example 5-3. First version of make_exhibit.plx

#!/usr/bin/perl -w # make_exhibit.plx # this script reads a pair of data files, extracts information # relating to a group of tradeshow exhibitors, and writes # out a browseable web-based directory of those exhibitors use strict; # configuration section: my $exhibit_file = './exhibit.txt'; # script-wide variable: my %listing; # key: company name ($co_name). # value: HTML-ized listing for this company. # read and parse the main exhibitor file my @listing_lines = ( ); # holds current listing's lines for passing # to the &parse_exhibitor subroutine ...

Get Perl for Web Site Management now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.