XSLT Cookbook, 2nd Edition

Preface

Extensible Stylesheet Language Transformations (XSLT) is a powerful technology for transforming XML documents into other useful forms, but it is sometimes considered difficult to learn. Its template-based approach makes it a prime candidate for learning by example, and XSLT examples are often easily repurposed. XSLT 2.0 greatly increases the power and elegance of XSLT but also increases its complexity.

When I first began working with XSLT (and again when learning XSLT 2.0), I longed for a cookbook that would accelerate my productivity by providing ready-made solutions to the challenges I faced. My first experience with such a book was O’Reilly’s Perl Cookbook. This book was more influential to my reluctant learning and ultimate appreciation of Perl than the original camel book (Programming Perl) by Larry Wall. I believe cookbooks are important because most software developers are not satisfied with simply figuring out how to make something work: they are interested in mastering the technology and using the best-known techniques, and they want answers fast. There is no better way to master a subject than by borrowing from those who have already discovered better ways to do things.

Longing for a cookbook soon turned into a desire to write one, especially since I collected several useful recipes—some that were developed by others and some that I created. However, I did not want to write an XSLT book simply packaged in an alternate form; I wanted to provide a useful resource that also highlighted some less-obvious ways to apply XSLT. In the process, I hoped to attract XML developers who have not yet been motivated to learn XSLT and who, in my opinion, are missing out on one of XML’s best productivity tools. If you are one of these folks who has not yet experienced XSLT, please bear with me for a few more paragraphs while I pitch the value of XSLT and the role of this book in helping you realize its potential.

XSLT is a language that lives simultaneously on the fringes and in the mainstream of current software-development technology. While working on the first edition of this project, I often found myself explaining to friends what XSLT was and why it was important enough to spend time writing a whole book about it. These same friends had heard of Java, Perl, and even XML, but not XSLT. I also observed an increasing number of requests for XSLT assistance on XSLT mailing lists and more industry attention in the form of books, articles, and sophisticated XSLT development tools. The XSLT user base is clearly growing daily; however, many software professionals and technology enthusiasts do not understand what it is and why it is important. With the release of new XSLT 2.0 implementations, I hope adoption of XSLT will accelerate, but this is not certain, partly due to competition from XQuery 1.0 and other XML manipulation methodologies. One thing is certain: mastering XSLT 2.0 is a worthwhile endeavor because its use will certainly increase, even if it never explodes. Further, learning XSLT will give you a deeper insight into XML processing even if you favor an alternative solution.

Although XSLT 1.0 is a mature language and XSLT 2.0 is not far behind, I would still guess that more that half of all companies and individuals working with XML do not use XSLT. Not so long ago, a colleague who is otherwise well-versed in all the latest technologies described XSLT as just another styling language. This misunderstanding is forgivable because XSLT advertises itself through the first three words in its name (Extended Stylesheet Language) and with the keyword that begins most XSLT programs (xsl:stylesheet). However, the last word in the XSLT acronym, Transformations, is what makes XSLT so important and is what drew me to the language in the first place. One of my goals in writing this book is to show how XSLT is relevant to a wide variety of problems. I also want to provide both novice and intermediate users of XSLT a one-stop shopping place for some of the most commonly requested XSLT techniques. Finally, I want to push the envelope of what one can do with XSLT so current users can go even further and the unconvinced can join the fold of highly productive XML transformers.

Over the years, I have heard many sweeping statements about computer science. Opinions like, “All computation is simply fancy bit manipulation,” “Computers are really just sophisticated number crunchers,” or “Everything a computer does can be understood in terms of symbol manipulation” are true to some extent. However, I would like to make a sweeping generalization of my own: “Every problem we solve with software can be understood in terms of transformations.” Mastery of computer science is mastery of transformation. Transformation is what CPUs do, it is what algorithms do, and it is what software developers do. And transformation is what XSLT does, at least when the input is XML (and sometimes when it is not). Of course, XSLT is not the only transformational game in town, and as with the thousands of languages that came before it, it is unclear whether it will evolve as an independent language or be absorbed into the next “big thing.” What is clear is that the ideas behind XSLT will not go away because many of these ideas are as old as computer science itself. This book helps the reader master and apply these ideas to specific problems.

Structure of This Book

To make this book useful to the broadest possible audience, I have retained most of the XSLT 1.0 solutions presented in the first edition. To these, I have added XSLT 2.0 solutions when 2.0 provided a significantly simpler or more elegant solution to the same problem. I also occasionally show 2.0 solutions to problems that would have been next to impossible to solve in 1.0. I have used separate subheadings to distinguish the 1.0 solution from the 2.0 solution. I hope this makes it easy for readers interested in one or the other to easily find what they are looking for. In a number of recipes, I do not provide a special 2.0 solution. Most of the time, this was because I felt the 1.0 solution would work as well in 2.0 or because I felt a 2.0 solution was obvious or would add very little value. I sincerely hope my desire to save trees and time does not overly frustrate the reader in this regard.

Both XSLT 1.0 and 2.0 rest firmly on the foundation provided by XPath 1.0 and 2.0, respectively. Some readers of the first edition took me to task for my lack of direct coverage of XPath. Chapter 1 was created partly to appease them and partly in response to the greater sophistication and complexity of XPath 2.0.

One of transformation’s most primitive forms is its processing of character sequences otherwise known as strings. Unlike the ancient language SNOBOL or the relatively modern Perl, XSLT was not specifically designed with string manipulation in mind. However, Chapter 2 shows that almost anything one wants to do with strings can be done within the confines of XSLT and then shows how the new features of 2.0 make it that much easier.

Numerical transformation (commonly referred to as mathematics ) is another crucial form of low-level transformation that pervades all software development simply because measurement and counting pervades life itself. Chapter 3 shows how to push the limits of XSLT’s mathematical capabilities, even though XSLT was not designed to be the next great Fortran replacement.

Manipulating dates and times is a quintessentially human activity, and a large part of our technological progress has been driven by an obsession with clocks, calendars, and accurate forecasting. Chapter 4 contains date and time recipes that augment an area standard XSLT 1.0 currently lacks. It also has in-depth coverage of the most welcome date and time functions added to XSLT 2.0. This chapter presents fascinating and difficult problems arising in date conversion and transformation, ready-made XSLT solutions, and important links to external date- and calendar-related resources.

All transformations begin by identifying the target you want to transform. If that target is a compound object, you need to traverse the object’s constituent parts as the transformation proceeds. Chapter 5 covers these topics and explores the problems XSLT was specifically designed to solve. This chapter describes XML as a tree and shows how XSLT can manipulate such trees. It also provides pointers for getting the best performance out of XML processing tasks.

Chapter 6 is brand new to the second edition and is dedicated entirely to XSLT 2.0. Readers who are primarily interested in getting up to speed in 2.0 are advised to read Chapters 1 and 6 first, and then peruse the rest of the XSLT 2.0 solutions sprinkled through the remaining chapters to gain a more solid foundation.

Before there were word processors, HTML, PDF, or other forms of sophisticated textual presentation, there was plain old text. The problem of transforming data used for computer consumption to data organized for human consumption is important. When the source data is XML, then the problem is perfect for XSLT. Chapter 7 provides recipes that control how text extracted from XML is rendered for layout on the terminal, on the text editor, or for import to programs that require delimited data, such as comma-separated values.

XML is quickly becoming the universal syntax for information transfer, and there is every indication that this trend will accelerate rather than abate. Therefore, a vast amount of XML transformation has XML as the destination as well as the source. Chapter 8 covers these types of transformations. It shows how XML documents can be split, merged, flattened, cleaned up, and otherwise reorganized with relatively little XSLT code.

Many transformations simply extract information from raw data to answer questions. Chapter 9 presents a treasure trove of recipes that demonstrate XSLT as a query language. It provides solutions to a wide variety of query-use cases that will probably resemble queries you’ll need to ask of your own XML data.

HTML is an important target of XSLT transformation. Chapter 10 demonstrates solutions to problems that arise when generating web content, including links, tables, frames, forms, and other client-side transformation issues.

Graphics programming transforms data to the visual domain. You would not think of XSLT as a graphics programming language, and it is not. However, when Scalable Vector Graphics (SVG) is the target of the transformation, XSLT can achieve impressive results. Chapter 11 describes the transformation of raw data into bar charts, pie charts, line plots, and other graphical components. It also covers the transformation of XML to a hierarchical tree diagram. This chapter emphasizes how transformations are structures that can be mixed and matched to create many different outputs.

Generating code is an automation task that I have always been interested in. Of all the transformations, humans still do this one best (lucky for us who make a living at it). However, sometimes it is better to write a program that generates code rather than write the code ourselves. Chapter 12 shows the advantage gained from representing than the data that drives code generation in XML and illustrates how XSLT is ideal for writing code generators for C++, Java, and XSLT itself. The chapter also includes a code-generation recipe taken from a design pattern represented in UML via XMI.

XSLT can enable some sophisticated applications. Chapter 13 includes some advanced uses of XSLT. The chapter is an eclectic mix that includes Visio VDX to SVG conversion, Microsoft Excel XML transformation, topic maps, and WSDL processing.

Although XSLT is powerful in its own right, we can really do some wicked things with extensions or by embedding XSLT in programs written in other languages. Chapter 14 provides extensive coverage of XSLT extensibility using Java and Java-Script. It also shows how XSLT can be used within Perl and Java programs.

Testing and debugging are essential to any software development effort, and XSLT development is no exception. Chapter 15 demonstrates useful techniques that can help you transform misbehaved XSLT programs into functional ones, even if you don’t have a native XSLT debugger handy.

Chapter 16 pushes the XSLT envelope to show how XSLT is far more than just another styling language. This chapter focuses on using XSLT as a generic and functional programming language. If nothing else, this chapter will open your eyes and stimulate your thoughts on the power of XSLT and how it can be used to create generic solutions.

Conventions Used in This Book

The following font conventions are used in this book:

Italic is used for:

Pathnames, filenames, and program names
Internet addresses, such as domain names and URLs
New items where they are defined

Constant width is used for:

Command lines and options that should be typed verbatim
Names and keywords in programs, including method names, variable names, and class names
XML element tags

Constant-width bold is used for emphasis in program code line.

Constant-width italic is used for replaceable arguments within program code.

Tip

This icon designates a note relating to the surrounding text.

Warning

This icon designates a warning related to the surrounding text.

Using Code Examples

This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: "XSLT Cookbook, Second Edition by Sal Mangano. Copyright 2006 O’Reilly Media, Inc., 0-596-00974-7.”

If you feel that your use of code examples falls outside fair use or the permission given here, feel free to contact us at permissions@oreilly.com.

Safari Enabled

When you see a Safari® Enabled icon on the cover of your favorite technology book, that means the book is available online through the O’Reilly Network Safari Bookshelf.

Safari offers a solution that’s better than e-books. It’s a virtual library that lets you easily search thousands of top tech books, cut and paste code samples, download chapters, and find quick answers when you need the most accurate, current information. Try it for free at http://safari.oreilly.com.

How to Contact Us

Please address comments and questions concerning this book to:

O’Reilly Media, Inc.

1005 Gravenstein Highway North

Sebastopol, CA 95472

(800) 998-9938 (in the United States or Canada)

(707) 829-0515 (international or local)

(707) 829-0104 (fax)

There is a web page for this book, which lists errata, examples, or any additional information. You can access this page at:

http://www.oreilly.com/catalog/xsltckbk

To comment or ask technical questions about this book, send email to:

bookquestions@oreilly.com

For more information about books, conferences, Resource Centers, and the O’Reilly Network, see the O’Reilly web site at:

http://www.oreilly.com/

Acknowledgments

Second Edition Acknowledgments

Writing a new book from scratch is certainly a formidable task, especially for a first-time author. However, I found developing the second edition almost as challenging, although for different reasons. It would have been an impossible task if it were not for the support of colleagues, friends, and family. Further, the opportunity and motivation to write the second edition would certainly not have been there had it not been for the enthusiastic support, kind words, and well-intentioned criticisms I received from the readers of the first edition.

I must again thank my editor, Simon St.Laurent, for his seemingly infinite patience in the face of my many missed deadlines, as well as for his sage advice in numerous matters big and small.

The second edition of XSLT Cookbook would not have been possible without the Herculean efforts of Michael Kay in not only serving as editor of both the XPath 2.0 and XSLT 2.0 working drafts, but more importantly in providing a high-quality free implementation of both in Saxon 8.

I also must thank Evan Lenz and Mike Fitzgerald for their excellent and complementary efforts at technical editing. Should the reader find an explanation that is technically unclear, imprecise, or flat out wrong, it is most likely the fault of yours truly for stubbornly ignoring or misunderstanding their suggestions.

Much of my experience in XPath 2.0 and XSLT 2.0 came from my work on the SD Times web site (www.sdtimes.com). I would like to thank Ted Bahr and Alan Zeichick of BZ Media for giving me the opportunity to reengineer their site and thank them and Rebecca Pappas for their patience when competing with this book for my time. I also must thank my clients and friends at SIAC, especially Carol Spiewak, Frank Carrera, Bert Spielman, Amy Hui, and Diana Verkavits, for providing the many challenging assignments that have gone a long way to shaping my skills as a software developer.

Finally, I want to thank my wife, Wanda, and sons, Leonardo and Salvatore, for suffering through another book with me. I know it is no fun watching Daddy sit in front of his computer, especially when we could be having all kinds of fun outside and getting into all kinds of trouble at Mommy’s expense! However, I hope you will eventually learn that hard work and dedication has it own special rewards as long as you take little breaks to goof off (and tell jokes about things that are broken and have cracks in them!).

First Edition Acknowledgments

Writing a book has always been a dream of mine, and I am very pleased that O’Reilly was the publisher that helped me realize this dream. However, this was far from a solo effort. Many people helped me achieve this goal, and I would like to take some time to acknowledge their contributions.

First, I want to thank Simon St.Laurent, my editor at O’Reilly. Simon was with me every step of the way, from the initial hastily written email proposal through the final stages of production. Simon was always there to reassure me and share in the joy and frustration that is inevitable in any creative endeavor.

Second, I want to thank Jeni Tennison, my primary technical editor. Jeni’s technical expertise and attention to detail are unparalleled. Not only did Jeni correct both my boneheaded and less-obvious mistakes, but she graciously contributed code and ideas to this book as she so generously does each day in the many XML-related mail groups she belongs to. (Any mistakes that remain are most definitely the fault of my own latent boneheadedness.) Jeni is truly unique, and I am sure the XML community will join me in thanking her for all her contributions and unselfish help.

Third, I would like to thank all my colleagues at Morgan Stanley for providing encouragement and praise for this work—especially my boss, Farid Khalili, for being understanding when I had to rush or stay home to make a deadline, and his boss, John Reynolds, for promoting my book to the entire Fixed Income Development department that he heads. I would also like to thank my former client SIAC and especially Karen Halbert for allowing me to spearhead a project that first honed my XSLT skills.

Fourth, I would like to thank those who graciously contributed material to this book, including Steve Ball, John Breen, Jason Diamond, Nikita Ogievetsky, and Jeni Tennison. I also want to thank the later technical editors, Micah Dubinko and Jirka Kosek, whose comments and suggestions were extremely helpful, as well as the O’Reilly production staff who helped bring this work to fruition.

Finally, I want to thank my parents, family, and friends. As always, you have sustained and nourished me and helped me keep a balanced life. Most of all, I want to thank my wife, Wanda, and son, Leonardo, without whose moral support and numerous sacrifices this book would have not been possible. Thank you, Wanda, for all the things you did that should have rightly been mine to do as I slaved in the dungeon! Thank you, Leonardo, for saying, “Daddy, you work” when I know you really wanted to say, “Daddy, we play!” Both of you and our child to be will always be my greatest success story.

Get XSLT Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

XSLT Cookbook, 2nd Edition by Sal Mangano