Preface

This book marks the intersection of two essential technologies for the Web and information services. XML, the latest and best markup language for self-describing data, is becoming the generic data packaging format of choice. Perl, which web masters have long relied on to stitch up disparate components and generate dynamic content, is a natural choice for processing XML. The shrink-wrap of the Internet meets the duct tape of the Internet.

More powerful than HTML, yet less demanding than SGML, XML is a perfect solution for many developers. It has the flexibility to encode everything from web pages to legal contracts to books, and the precision to format data for services like SOAP and XML-RPC. It supports world-class standards like Unicode while being backwards-compatible with plain old ASCII. Yet for all its power, XML is surprisingly easy to work with, and many developers consider it a breeze to adapt to their programs.

As the Perl programming language was tailor-made for manipulating text, Perl and XML are perfectly suited for one another. The only question is, “What’s the best way to pair them?” That’s where this book comes in.

Assumptions

This book was written for programmers who are interested in using Perl to process XML documents. We assume that you already know Perl; if not, please pick up O’Reilly’s Learning Perl (or its equivalent) before reading this book. It will save you much frustration and head scratching.

We do not assume that you have much experience with XML. However, it helps if you are familiar with markup languages such as HTML.

We assume that you have access to the Internet, and specifically to the Comprehensive Perl Archive Network (CPAN), as most of this book depends on your ability to download modules from CPAN.

Most of all, we assume that you’ve rolled up your sleeves and are ready to start programming with Perl and XML. There’s a lot of ground to cover in this little book, and we’re eager to get started.

How This Book Is Organized

This book is broken up into ten chapters, as follows:

Chapter 1 introduces our two heroes. We also give an XML::Simple example for the impatient reader.

Chapter 2 is for the readers who say they know XML but suspect they really don’t. We give a quick summary of where XML came from and how it’s structured. If you really do know XML, you are free to skip this chapter, but don’t complain later that you don’t know a namespace from an en-dash.

Chapter 3 shows how to get information from an XML document and write it back in. Of course, all the interesting stuff happens in between these steps, but you still need to know how to read and write the stuff.

Chapter 4 explains event streams, the efficient core of most XML processing.

Chapter 5 introduces the Simple API for XML processing, a standard interface to event streams.

Chapter 6 is about . . . well, processing trees, the basic structure of all XML documents. We start with simple structures of built-in types and finish with advanced, object-oriented tree models.

Chapter 7 covers the Document Object Model, another standard interface of importance. We give examples showing how DOM will make you nimble as a squirrel in any XML tree.

Chapter 8 covers advanced tree processing, including event-tree hybrids and transformation scripts.

Chapter 9 shows existing real-life applications using Perl and XML.

Chapter 10 wraps everything up. Now that you are familiar with the modules, we’ll tell you which to use, why to use them, and what gotchas to avoid.

Resources

While this book aims to cover everything you’ll need to start programming with Perl and XML, modules change, new standards emerge, and you may think of some oddball situation that we haven’t anticipated. Here’s are two other resources you can pursue.

The perl-xml Mailing List

The perl-xml mailing list is the first place to go for finding fellow programmers suffering from the same issues as you. In fact, if you plan to work with Perl and XML in any nontrivial way, you should first subscribe to this list. To subscribe to the list or browse archives of past discussions, visit http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml.

You might also want to check out http://www.xmlperl.com, a fairly new web site devoted to the Perl/XML community.

CPAN

Most modules discussed in this book are not distributed with Perl and need to be downloaded from CPAN.

If you’ve worked in Perl at all, you’re familiar with CPAN and how to download and install modules. If you aren’t, head over to http://www.cpan.org. Check out the FAQ first. Get the CPAN module if you don’t already have it (it probably came with your standard Perl distribution).

Font Conventions

Italic is used for URLs, filenames, commands, hostnames, and emphasized words.

Constant width is used for function names, module names, and text that is typed literally.

Constant-width bold is used for user input.

Constant-width italic is used for replaceable text.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

O’Reilly & Associates, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
(800) 998-9938 (in the United States or Canada)
(707) 829-0515 (international or local)
(707) 829-0104 (fax)

There is a web page for this book, which lists errata, examples, or any additional information. You can access this page at:

http://www.oreilly.com/catalog/perlxml

To comment or ask technical questions about this book, send email to:

For more information about books, conferences, Resource Centers, and the O’Reilly Network, see the O’Reilly web site at:

http://www.oreilly.com/

Acknowledgments

Both authors are grateful for the expert guidance from Paula Ferguson, Andy Oram, Jon Orwant, Michel Rodriguez, Simon St.Laurent, Matt Sergeant, Ilya Sterin, Mike Stok, Nat Torkington, and their editor, Linda Mui.

Erik would like to thank his wife Jeannine; his family (Birgit, Helen, Ed, Elton, Al, Jon-Paul, John and Michelle, John and Dolores, Jim and Joanne, Gene and Margaret, Liane, Tim and Donna, Theresa, Christopher, Mary-Anne, Anna, Tony, Paul and Sherry, Lillian, Bob, Joe and Pam, Elaine and Steve, Jennifer, and Marion); his excellent friends Derrick Arnelle, Stacy Chandler, J. D. Curran, Sarah Demb, Ryan Frasier, Chris Gernon, John Grigsby, Andy Grosser, Lisa Musiker, Benn Salter, Caroline Senay, Greg Travis, and Barbara Young; and his coworkers Lenny, Mela, Neil, Mike, and Sheryl.

Jason would like to thank Julia for her encouragement throughout this project; Looney Labs games (http://www.looneylabs.com) and the Boston Warren for maintaining his sanity by reminding him to play; Josh and the Ottoman Empire for letting him escape reality every now and again; the Diesel Cafe in Somerville, Massachusetts and the 1369 Coffee House in Cambridge for unwittingly acting as his alternate offices; housemates Charles, Carla, and Film Series: The Cat; Apple Computer for its fine iBook and Mac OS X, upon which most writing/hacking was accomplished; and, of course, Larry Wall and all the strange and wonderful people who brought (and continue to bring) us Perl.

Get Perl and XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.