Writing the Modified Files Back to Disk

Just one more component is needed in our power-editing script: the part where it takes the modified content from the HTML file and writes it back out to disk, replacing the previous version of the file. Before doing that, though, let’s run a test by having the script print out the modified files to our screen, so we can check to make sure the correct changes are being made. Once we’re satisfied with that, we can make the final modifications to the script that will cause it to actually update the files on disk.

Below the while loop that cycles through the reading of each file’s contents, and below the close IN statement but before the right curly brace that ends the larger foreach loop that cycles through each file, we’ll add the following to the script:

# print $content, to check the changes
print "File $file, after changes:\n\n$content\n";

Now you can run the script at the command line, piping its output to more, and page your way through the output, checking to make sure that only the changes you intended are being made:

[jbc@andros testsite]$ fix_links.plx *.html | more
File form_to_email.html, after changes:

<HTML>
<HEAD>
<TITLE>This is the title</TITLE>
</HEAD>
<BODY>
(etc.)

Once you’re happy with the way that output looks, you can comment out that print statement and add the following lines just below it:

open  OUT, "> $file" or die "can't open $file for writing: $!";
print OUT $content;
close OUT            or die "can't close $file after writing: $!";

These three lines are very important. They represent the most commonly used method for taking data you have in your Perl program and printing it out to a file on disk.

The first line shows how to use the open function to create a new filehandle for writing to a file. The first argument to the open function is the name for the filehandle (by convention, ALL UPPERCASE); the second argument is a string containing the name of the file that you want to open, preceded by the > character.

The > character in the string is the part that tells Perl to open the file for writing. If the filename you specify in that string does not already exist, a new, empty file will be created as a result of the open statement. If the filename specified does already exist, any data it contains will be erased. (Unix folks say that the file’s contents will be clobbered.) This clobbering happens as soon as the open operation is performed, before any data has actually been written to the file. So, obviously, you need to be very careful about opening files in this fashion, especially when those files contain data you care about.

Once you’ve opened the file for writing, printing data to the file is easy: you just use the print function, followed by the filehandle name, followed by the data you want to print. Again, as with the printing-to-sendmail example from the previous chapter, you need to remember not to put a comma after the filehandle name when printing to it.

Finally, we close the filehandle. Notice how we’re checking for failure and having the script die with an informative error message if either the open or close operation fails.

The final version of this script is given in Example 4-2.

Example 4-2. A script for modifying HREF attributes

#!/usr/bin/perl -w

# fix_links.plx

# this script processes all the *.html files whose names are supplied
# to it on the command line, replacing all HREF attributes
# that point to local resources in the current directory
# with rewritten versions that have:
#
# 1) '.htm' extensions changed to '.html', and 
# 2) VaRiEnT captialization uniformly downcased.

foreach $file (@ARGV) {

    unless (-f $file) {
        warn "$file is not a plain file. Skipping...\n";
        next;
    }
    
    unless ($file =~ /\.html$/) {
        warn "$file doesn't end in .html. Skipping...\n";
        next;
    }

    if ($file =~ m{/}) {
        warn "$file contains a slash. Skipping...\n";
        next;
    }
    
    $content = '';
    open IN, $file or die "can't open $file for reading: $!";

    while ($line = <IN>) {
        
        # for HREF attributes pointing to the current directory,
        # downcase attribute, and rename '.htm' to '.html'
        
        $line =~ s/HREF="([^"\/]+\.htm)"/HREF="\L$1\El"/gi;

        $content .= $line;
    }

    close IN;
    
    # print $content, to check the changes
    # print "File $file, after changes:\n\n$content\n";

    open OUT, "> $file" or die "can't open $file for writing: $!";
    print OUT $content;
    close OUT           or die "can't close $file after writing: $!";
}

And that’s it! If you run this script in the directory containing the files to be modified, using a command-line argument of *.html, it will process each file in that directory whose names end with .html, checking all the HREF attributes and converting those that need it to using lowercase filenames ending in .html.

Get Perl for Web Site Management now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.