Redaction of Sensitive Information

Information hidden in a document can be the unintentional consequence of using complex software. From a forensics point of view, it represents a bonus—something we didn’t expect to find. But information can also be intentionally hidden, disguised, or removed. Uncovering what someone does not want you to know represents a challenge.

In a variety of circumstances, government agencies, the courts, and others need to publish documents that contain sensitive information. Within these documents, they may need to remove or obscure the names of individuals, identification numbers such as a social security IDs, or colorful expletives that are deemed inappropriate for publication. A good example would be the publication of a government intelligence briefing as part of a congressional hearing, in which a foreign informant is named. The name for this selective editing is redaction . It is a polite name for censorship.

In the past, redaction has meant obscuring the relevant text on a piece of paper with a black marker. Any subsequent photocopies of the paper would retain the blacked-out region and there would be no way for anyone to read the underlying words. That has proven to be a simple, cheap, and extremely effective way of hiding information. But these days, most of the documents that we deal with are in electronic form. The PDF file format, in particular, is a very convenient way to distribute documents, including those scanned from handwritten or ...

Get Internet Forensics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.