Technically, PDF is a complex language. The specification is 400 pages long. If you don’t want to know the details, skip to the section Putting It Together: A High-Volume Invoicing System. If you do, it’d be a good idea to open one of the sample PDF files provided with this chapter; unlike most you will find on the Web, they are uncompressed and numbered in a sensible order. We’ve provided a brief roadmap to the PDF format as we feel that it offers many benefits, and you might want to add your own extensions in the future.
The outer layer of the PDF format provides overall document structure, specifying pages, fonts used, and advanced features such as tables of contents, special effects, and so on. Each page is a separate object and contains a stream of page-marking operators; basically, highly abbreviated PostScript commands. The snippet of PostScript you saw earlier would end up like this:
72 720 m 72 72 l /F5 24 Tf 42 TL 80 720 Td ('Hello World') Tj
Unfortunately this code, which can at least be decoded given time and
you know where to look, can be compressed in a binary form and is
buried inside an outer layer that’s quite complex. The outer
layer consists of a series of numbered objects
(don’t you love that word?) including pages, outlines,
clickable links, font resources, and many other elements. These are
delimited by the keywords
endobj and numbered within the file. Here’s
object, which sits at the top of
PDF’s object model:
1 0 obj << ...