Just one more component is needed in our power-editing script: the part where it takes the modified content from the HTML file and writes it back out to disk, replacing the previous version of the file. Before doing that, though, let’s run a test by having the script print out the modified files to our screen, so we can check to make sure the correct changes are being made. Once we’re satisfied with that, we can make the final modifications to the script that will cause it to actually update the files on disk.
Below the while
loop that cycles through the
reading of each file’s contents, and below the close IN
statement but before the right curly brace that ends the
larger foreach
loop that cycles through each file,
we’ll add the following to the script:
# print $content, to check the changes print "File $file, after changes:\n\n$content\n";
Now you can run the script at the command line, piping its output to more, and page your way through the output, checking to make sure that only the changes you intended are being made:
[jbc@andros testsite]$ fix_links.plx *.html | more
File form_to_email.html, after changes:
<HTML>
<HEAD>
<TITLE>This is the title</TITLE>
</HEAD>
<BODY>
(etc.)
Once you’re happy with the way that output looks, you can
comment out that print
statement and add the
following lines just below it:
open OUT, "> $file" or die "can't open $file for writing: $!"; print OUT $content; close OUT or die "can't close $file after writing: $!";
These three lines are very important. They represent the most commonly used method for taking data you have in your Perl program and printing it out to a file on disk.
The first line shows how to use the
open
function to create a new
filehandle for writing to a
file. The first argument to the open
function is
the name for the filehandle (by convention, ALL UPPERCASE); the
second argument is a string containing the name of the file that you
want to open, preceded by the >
character.
The >
character in the string is the part that
tells Perl to open the file for writing. If the filename you specify
in that string does not already exist, a new, empty file will be
created as a result of the open
statement. If the
filename specified does already exist, any data
it contains will be erased. (Unix folks say that the file’s
contents will be clobbered
.) This clobbering
happens as soon as the open
operation is
performed, before any data has actually been written to the file. So,
obviously, you need to be very careful about opening files in this
fashion, especially when those files contain data you care about.
Once you’ve opened the file for writing, printing data to the
file is easy: you just use the
print
function, followed by the
filehandle name, followed by the data you want to print. Again, as
with the printing-to-sendmail
example from the
previous chapter, you need to remember not to put a comma after the
filehandle name when printing to it.
Finally, we close the filehandle. Notice how we’re checking for
failure and having the script die with an informative error message
if either the open
or
close
operation fails.
The final version of this script is given in Example 4-2.
Example 4-2. A script for modifying HREF attributes
#!/usr/bin/perl -w # fix_links.plx # this script processes all the *.html files whose names are supplied # to it on the command line, replacing all HREF attributes # that point to local resources in the current directory # with rewritten versions that have: # # 1) '.htm' extensions changed to '.html', and # 2) VaRiEnT captialization uniformly downcased. foreach $file (@ARGV) { unless (-f $file) { warn "$file is not a plain file. Skipping...\n"; next; } unless ($file =~ /\.html$/) { warn "$file doesn't end in .html. Skipping...\n"; next; } if ($file =~ m{/}) { warn "$file contains a slash. Skipping...\n"; next; } $content = ''; open IN, $file or die "can't open $file for reading: $!"; while ($line = <IN>) { # for HREF attributes pointing to the current directory, # downcase attribute, and rename '.htm' to '.html' $line =~ s/HREF="([^"\/]+\.htm)"/HREF="\L$1\El"/gi; $content .= $line; } close IN; # print $content, to check the changes # print "File $file, after changes:\n\n$content\n"; open OUT, "> $file" or die "can't open $file for writing: $!"; print OUT $content; close OUT or die "can't close $file after writing: $!"; }
And that’s it! If you run this script in the directory
containing the files to be modified, using a command-line argument of
*.html
, it will process each file in that
directory whose names end with .html
, checking all
the HREF
attributes and converting those that need
it to using lowercase filenames ending
in
.html
.
Get Perl for Web Site Management now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.