Message Content

From a forensics perspective, the content of a message is actually the least interesting part. If the message carries a virus or spyware, then the payload will be contained in the attachment. If it is a phishing attempt, then the web site it links to is where your interest will lie.

The experts in spam analysis and filtering can do a far better job than I at describing the techniques they use to classify messages and decide if they represent spam or not. This is a fascinating area that combines advanced computer science, with its statistical and pattern recognition algorithms, and practical software engineering that builds and deploys tools in an ongoing battle with the spammers.

There are three main approaches to dealing with spam. Here are resources to each of these that you might find useful. Rule-based filtering looks for specific strings and signatures within a message and assigns a score based on the matches it finds. SpamAssassin is a leading open source tool that uses this approach (http://spamassassin.apache.org/). Statistical filtering, using Bayesian analysis, looks at things like word frequencies in sets of messages that have been manually classified as spam or not, typically by the end user. As such it reflects their personal interests and can adapt to changes in the types of email that an individual receives. This is the approach taken in the Thunderbird email client, among others. A good introduction to Bayesian filtering is this paper by Paul Graham: ...

Get Internet Forensics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.