Preface

The Portable Document Format (PDF) is the way in which most documents are produced for distribution, collaboration, and archiving worldwide. It has been standardized by the International Organization for Standardization (ISO) and by governments in over 75 countries as their format of choice for their documentation. The printing industry has required the use of PDF for any professional printing job. With billions of publicly available documents and an untold number of documents living in private repositories, no other file format has the wide reach and ubiquity that PDF does.

However, even with those billions of documents in circulation, the PDF format remains poorly understood by users and developers alike due to there being a dearth of documentation beyond ISO 32000-1, the PDF standard itself. And while the standard is an excellent technical document, its size, complexity, and dry style make it unapproachable for many.

The goal of this book is to provide an approachable reference to PDF. It covers key topics from the standard in a way that will enable the technically minded to understand what is inside a PDF. For those simply needing to examine the internals of a PDF to diagnose problems, you will find the tools you need here, and those who want to construct their own valid and well-formed documents will find out how to do so.

Who Should Read This Book

While this book goes into some fairly deep technical topics, I’ve tried to present them in such a way that any technically minded individual should find the material approachable and understandable.

This book is suitable for:

  • Users of PDF software, such as Adobe Acrobat, who want to understand what is going on “under the hood” of the various features in those products (features like inserting and deleting pages or converting images).

  • Industry professionals in areas such as electronic publishing and printing who want to better understand PDF in order to improve their systems, or who need to diagnose issues in their PDF processing.

  • Programmers writing code to read, edit, or create PDF files.

Organization of Content

Chapter 1

We begin by looking at the various objects that make up a PDF file and how they are combined together into a cohesive whole.

Chapter 2

In this chapter we look at the core aspect of PDF—its imaging model. We learn how to create a page and draw some graphics on it.

Chapter 3

Continuing on from our discussion of the core imaging model, in this chapter we explore how to incorporate raster images into your PDF content.

Chapter 4

Next, we learn how to incorporate the last of the common types of PDF content—text. Of course, a discussion of text in PDF wouldn’t be complete without an understanding of fonts and glyphs.

Chapter 5

PDF isn’t just about static content. This chapter will introduce various ways in which a PDF can gain interactivity, specifically around enabling navigation within and between documents.

Chapter 6

This chapter explores the special objects that are annotations, which are drawn on top of the regular content to enable everything from interactive links to 3D to video and audio.

Chapter 7

Next, we look at how interactive forms are provided for in the PDF language.

Chapter 8

This chapter demonstrates how a PDF can be used in a way similar to a ZIP archive by embedding files inside of it.

Chapter 9

This chapter explains how video and audio content can be referenced in or embedded into a PDF for playing as part of rich content.

Chapter 10

This chapter introduces optional content, which only appears at certain times, such as on the screen but not when printed or only for certain users.

Chapter 11

This chapter looks at how to add semantic richness to your content by tagging it with HTML-like structures such as paragraphs and tables.

Chapter 12

This chapter explores the various ways in which metadata can be incorporated into a PDF file, from the simplest document level strings to rich XML attached to individual objects.

Chapter 13

Finally, this chapter introduces the various open international standards based on PDF, including the full PDF standard itself (ISO 32000-1), the various subsets (such as PDF/A and PDF/X), as well as related work (such as PAdES).

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic
Indicates new terms, URLs, email addresses, file and path names, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, operators and operands, HTML elements, and keys and their values.

Tip

This icon signifies a tip, suggestion, or general note.

Warning

This icon indicates a warning or caution.

Safari® Books Online

Note

Safari Books Online is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online offers a range of product mixes and pricing programs for organizations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and dozens more. For more information about Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://oreil.ly/developing-with-pdf.

To comment or ask technical questions about this book, send email to .

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

This book wouldn’t exist were it not for the love and support of my באַשערט (bashert), Marla Rosenthol.

Dr. James King and Dr. Matthew Hardy of Adobe Systems and Olaf Drümmer of Callas Software took time out of their normal jobs to do technical reviews of the material in this book. Thanks guys!

I would also like to thank my editors, Simon St. Laurent and Meghan Blanchette.

Get Developing with PDF now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.