Untainting with Backreferences

Now that we’ve removed all the < characters from the submitted guestbook data, we’re ready to untaint that data so that Perl’s tainting mechanism won’t cause the script to die when we write the new entry out to the guestbook data file. Here, again, is the chunk of code that does that untainting:

if ($sub{$_} =~ /^([^<]*)$/) {
    $sub{$_} = $1;             # value is untainted now
}

Looking carefully at that regular expression, the /^([^<]*)$/ search pattern says “Try to do a match in which we start at the very beginning of the string, match a whole bunch of characters that are anything except <, and end up at the end of the string. And while we’re at it, let’s save whatever gets matched in $1 for later backreferencing.” Or, to put it another way, this expression says “Match the whole string, but only if the string has no < characters in it. If it has any < characters, don’t match anything.”

We can be reasonably sure this expression will match because we previously used the substitution expression to replace all the < characters with &lt;. Now we just take the captured string in $1 and assign it back to $sub{$_}, and voilá, we’ve laundered that particular hash value, and Perl’s tainting mechanism no longer cares what we do with it.

Tip

You’ll notice that Perl’s untainting mechanism doesn’t actually stop us from doing insecure things. We could always use an all-inclusive pattern like /^(.*)$/ to match a piece of tainted data, then assign whatever the old value was ...

Get Perl for Web Site Management now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.