In this chapter, we describe the layout and content of the PDF file’s four main sections, and the syntax of the objects which make up each one. We also outline the process of reading a PDF file into a high level data structure, and the converse operation of writing that structure to a PDF file.
A simple valid PDF file has four parts, in order:
The header, which gives the PDF version number.
The body, containing the pages, graphical content, and much of the ancillary information, all encoded as a series of objects.
The cross-reference table, which lists the position of each object within the file, to facilitate random access.
The trailer including the trailer dictionary, which helps to locate each part of the file and lists various pieces of metadata which can be read without processing the whole file.
Example 3-1. A small PDF file
%PDF-1.0 Header starts here %âãÏÓ 1 0 obj Body starts here << /Kids [2 0 R] /Count 1 /Type /Pages >> endobj 2 0 obj << /Rotate 0 /Parent 1 0 R /Resources 3 0 R /MediaBox [0 0 612 792] /Contents [4 0 R] /Type /Page >> endobj 3 0 obj << /Font << /F0 << /BaseFont /Times-Italic /Subtype /Type1 /Type /Font >> >> >> endobj 4 0 obj << /Length 65 >> stream 1. 0. 0. 1. 50. 700. cm BT /F0 36. Tf (Hello, World!) Tj ET endstream endobj 5 0 obj << /Pages 1 0 R /Type /Catalog >> endobj xref Cross-reference ...