32

image Extracting Text from PDF Files

Pulling text and graphics out of PDF files is possible provided the files have not been encoded and locked. Even then, it may be possible to grab screen shots.

Third-party PostScript tools are useful for processing PDF (PostScript) files. GhostScript and the PDF kit embedded within Mac OS X are good examples. This is not for the fainthearted, but developing applications around their APIs in a compiled language like C or even Java may be the route to binding PDF documents into your workflow.

You can obtain some developer support from Adobe, but you need to subscribe for a fee. An SDK for Acrobat is available for ...

Get Developing Quality Metadata now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.