Searching with Signatures

Most signatures can be represented as regular expressions that can be used for searching through mail files or directories of web pages. The Unix grep command can be used to scan both of these, as long as care is taken to escape any characters that have special meaning to this command. It is a very efficient way to identify files that contain a match and can report the lines and line numbers where the matches are found. But in the case of email files, what you really need is a way to extract the individual messages that match, and grep cannot do this for you.

Most email client programs allow you to search the content of messages, but these can be laborious to use and may not offer the flexibility that you need. The Perl script shown in Example 10-1 will step through each message in a mail file, in standard MBOX format , and output those that contain one or more matches to a user-specified pattern.

Example 10-1. extract_match_string.pl

#!/usr/bin/perl -w if(@ARGV == 0 or @ARGV > 2) { die "Usage: $0 <pattern> [<mail file>]\n"; } elsif(@ARGV == 1) { $ARGV[1] = '-'; } my $pattern = $ARGV[0]; my $flag = 0; my $separator = 0; my $text = ''; open INPUT, "< $ARGV[1]" or die "$0: Unable to open file $ARGV[1]\n"; while(<INPUT>) { if(/^From\s.*200\d$/ and $separator == 1) { $separator = 0; if($flag) { # print previous message if it matched print $text; $flag = 0; } $text = ''; } elsif(/^\s*$/) { $separator = 1; } else { $separator = 0; if(/$pattern/) { $flag++; } } ...

Get Internet Forensics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.