Chapter 12. Metadata

This chapter will explore the various ways in which metadata can be incorporated into a PDF file, from the simplest document-level strings to rich XML attached to individual objects.

The Document Information Dictionary

It was clear even with the original 1.0 version of PDF that the presence of metadata was a requirement for any file format, and certainly one that would be representing documents for electronic distribution and storage. For this purpose, the document information dictionary (or info dictionary, or even just info dict) was created (see Example 12-1).

As the name implies, the info dictionary is a standard PDF dictionary object. However, unlike every other object you’ve encountered so far, this object is referenced not from the catalog, but instead from the trailer. The original PDF 1.0 specification documented four (optional) keys for this dictionary, each one allowing only a string value encoded in PDFDocEncoding.

Author
The name of the person(s) who created the document.
CreationDate
The date and time the document was created, formatted as a date.

Note

Dates, as a type of string, were added to PDF in version 1.1, so very early PDF files may have the value of this key as a simple string.

Creator
The software used to author the original document that was used as the basis for conversion to PDF. If the PDF was created directly, the value may be left blank or may be the same as the Producer.
Producer
The name of the product that created the PDF. ...

Get Developing with PDF now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.