Problems with Simple Signatures

There are two problems with using simple regular expressions to identify and link different email messages and web pages. First you have to come up with a good signature pattern. If you are starting out with a single email message, for example, then you need to define a number of patterns and try them out to see which, if any, match similar related messages, with no false positives. This is a process of trial and error that can be quite time consuming.

On top of that, you have to deal with the variations that are introduced into similar messages by spammers in order to circumvent antispam filters. In many ways, these filters are trying to do the same job as you. They want to find unique patterns that mark a message as being spam so they can divert it from your Inbox. The spammers know this, and they know a lot about the methods used by these filters. In order for their spam to keep flowing, they continually introduce variation into their messages in the hope that these disrupt whatever patterns are being scanned for.

These variations may take the form of random words being added to the end of a message, spelling changes being made to recognizable words, and message headers being continually changed between each batch of mail.

Consider the following very similar blocks of text taken from two phishing emails that targeted eBay users. In order to get around spam filters, the author has inserted three words (and, the, then) into the second version and changed ...

Get Internet Forensics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.