Document File Formats

The document file formats covered in this book are PostScript and PDF. PostScript is a programming language for representing two-dimensional graphics. It was one of the many innovations of the late ’70s to come out of the Xerox Palo Alto Research Center (PARC) and have a significant impact on the way people think about and use computers today. PostScript was the brainchild of John Warnock, who used the language to research graphic arts applications of computers. In 1982, Warnock and Chuck Geschke formed Adobe Systems, and the language that they developed at PARC was redesigned and packaged as PostScript.

The PDF format came along in the early ’90s, also from Adobe. PDF builds on the capabilities of PostScript but is aimed at becoming a truly portable platform for the electronic interchange of files. PostScript and PDF are described in detail in Chapter 10 through Chapter 12.

PostScript: A Language for Page Representation

PostScript has become the standard programming language for printing.[4] Over the course of a couple of decades, it has gone through several revisions, referred to as PostScript Level I, Level II, and Level III. There are several other page description formats in the PostScript family, each with its own application niche:

Encapsulated Postscript (EPS)

Encapsulated PostScript is a standard format for including a PostScript page description in other page descriptions. An EPS file is simply a one-page PostScript file (representing any combination of text or graphics) that strictly follows the Document Structuring conventions (see Chapter 10) and is self-contained to the point that it does not depend on the existence of external graphics states.

EPSI and EPSF

An EPSI (Encapsulated PostScript Interchange) file is simply an EPS file that is bundled with a bitmapped preview image. An EPSF is an EPSI formatted for older versions of the Macintosh operating system, where the PostScript code is stored in the data fork of the file and a PICT format preview image is stored in the resource fork.

Display PostScript

Display PostScript is a variant of PostScript intended for drawing graphics on raster displays.

Chapter 11 presents a new module for generating PostScript text blocks, drawing primitives, and documents from Perl. The PostScript module presents an easy-to-use interface to place blocks of text on a page:

#!/usr/local/bin/perl -w

use strict;
use PostScript::TextBlock;

my $tb = new PostScript::TextBlock;

$tb->addText( text => "The Culinary Dostoevski\n",
              font => 'CenturySchL-Ital',
              size => 24,
              leading => 100
             );
$tb->addText( text => "by Ms. Charles Fine Adams\n",
              font => 'URWGothicL-Demi',
              size => 18,
              leading => 36
             );

open IN, "example.txt";
my $text;
while (<IN>) {
    $text .= <IN>;
}
close IN;
$tb->addText( text => $text,
              font => 'URWGothicL-Demi',
              size => 14,
              leading => 24
             );

open OUT, '>culinarydostoevski.ps';
my $pages = 1;

# create the first page

my ($code, $remainder) = $tb->Write(572, 752, 20, 772);
print OUT "%%Page:$pages\n";
print OUT $code;
print OUT "showpage\n";

# Print the rest of the pages, if any

while ($remainder->numElements) {
    $pages++;
    print OUT "%%Page:$pages\n";
    ($code, $remainder) = $remainder->Write(572, 752, 20, 772);
    print OUT $code;
    print OUT "showpage\n";
}

The Image::Magick module relies on the free Ghostscript interpreter to handle its PostScript output. Image::Magick does a nice job rasterizing specific pages of PostScript documents:

#!/usr/bin/perl -w

use strict;
use Image::Magick;

my $image = new Image::Magick;

# Rasterize the first two pages

$image->Set(density => "300x300");   # Default is 72x72

my $status = $image->Read('document.ps[0,1]');
die "$status\n" if $status;

$image->Write('png:document.png');

undef $image;

PDF: Toward a Truly Portable Document

PDF files have become a common format for the electronic distribution of documents originally created as print documents in PostScript. PDF files are meant to be printed or viewed on a screen with a viewer such as Adobe’s Acrobat Reader. PDF seems to have a bright future as the document storage format of choice; printers and graphic artists have found that the PDF format offers everything needed by a professional service bureau. On the raster display side of things, PDF is being used in such applications as Quartz, the Mac OS X rendering engine, which is based on the PDF imaging model.

The PDF::API2 module described in Chapter 12 is useful for generating PDF documents. This example creates a 20-page PDF with three lines of text in the upper left corner of each page:

#!/usr/bin/perl -w
use strict;
use PDF::API2;

my $pdf=PDF::API2->new( );
my $font = $pdf->corefont("Times-Roman", 0);

foreach my $p (1..20) {
    my $page = $pdf->page( );
    my $text = $page->text( );
    $text->font($font, 72);
    $text->translate(20,700);
    $text->text("Page $p, line 1");
    $text->cr(-80);
    $text->text("Page $p, line 2");
    $text->cr(-80);
    $text->text("Page $p, line 3");
}

print $pdf->stringify( );


[4] Arguably, the one exception to this statement is in the field of mathematics publishing, where TEX still dominates the serious journals.

Get Perl Graphics Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.