Chapter 8. File Contents

Internet forensics is not just about spam and fake web sites. This chapter shows how you can uncover information hidden in the files that you work with every day. Microsoft Word and Adobe Portable Document Format (PDF) files are two of the most common formats that are used to create and encapsulate important documents. Both formats are extremely rich in the sense that they can contain text with complex fonts and styling, images, hyperlinks, form elements, and a slew of other data types. These great features come at a cost, however, in that the formats become so complex and the files become so large that the only way to access them is through a specific application such as Word or Adobe Acrobat. The approach of opening the file in a plain text editor and reading the contents is simply not feasible in these cases.

That complexity becomes a liability when the applications store information that is hidden from the casual user. As the dramatic examples in this chapter show, it is all too easy to reveal more information than you realize. For those of us with an inquisitive eye, these documents are ideal subjects for our forensic attention.

Word Document Metadata

Microsoft Word is probably the most widely used word-processing software in the world. Although the vast majority of people only use its basic functions, it has many advanced capabilities. One of the more well known of these is Track Changes , a set of reviewing tools that allow multiple people to modify ...

Get Internet Forensics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.