Preface

When I first wrote What Is EPUB 3? in the summer of 2011, it was envisioned as both a brief standalone piece that would orient people to the new EPUB 3.0 revision the International Digital Publishing Forum (IDPF) was about to release and also as an introduction to what we hoped would evolve into a larger best practices guide—the one you’re reading now.

You’ll find that book distilled down to its bare essentials in this book’s introduction, but if you are new to EPUB, there is much information put into that original guide that is helpful to know before tackling this one, so if I can recommend some advance reading, it would be to grab a copy of that ebook and give it a skim. If you’re not familiar with EPUBs generally, or what’s changed from 2 to 3, it’ll help give you a general view of the big picture before launching into the details that we’ll be covering here. It’s only a small-chapter-length in size, too (and free!), so it won’t take you long to get through, and it will give you a condensed perspective on what an EPUB is.

This guide instead delves right into the EPUB container and walks you through best practices as they relate to production of your publications; you’ll find a bit of a mixture of practices and guidance on how to use EPUB technologies. You don’t necessarily have to know the technology of publishing EPUBs inside and out to find value here, nor do you have to be a programmer or tech geek, but this book is for the ebook practitioner.

In planning out this guide, one of the challenges was trying to keep straight where the boundaries are between EPUB 3 and the technologies it combines under its format umbrella. Can a single book about EPUB 3 best practices try to detail every nuance of HTML5, CSS3, JavaScript, MathML and SVG, just to pick out some of the prime content document technologies? The answer should be obvious, considering the volume of material that’s already been written on those subjects.

What we’ve tried to do in this guide is find the key areas of overlap between those technologies as they relate to publishing. You’re going to find a lot of discussion about all of the features just listed, and more, but if you’re just getting started with the technologies used in EPUBs this book will be more of a starting point on your journey. You will learn about potential issues when scripting in the reading system environment, for example, but you won’t find a tutorial on the JavaScript language.

Each of the chapters in this book deals with a unique aspect of the creation and distribution process. There is no assumption that you’re familiar with the entire format, because the production of EPUBs often involves expertise from a number of different functional areas. The people responsible for ensuring the technology of your ebooks probably aren’t going to be the same people who are responsible for the metadata. The authors and editors creating the content are likewise not going to be the people bundling and distributing the ebook. So although the book will move over EPUB 3 in a linear fashion, and can be read from cover to cover to learn about production as a whole, each chapter is also intended to be readable in isolation, with pointers forward and back as necessary.

And although we hope you’ll implement all the best practices you can, the book is not designed to be a checklist to content conformity, and is not written as such. Everyone produces using different methods, and everyone has to work within the constraints of their production workflows, so we’ve tried hard not to target specific processes or reading systems but stick to the ultimate outcome. If you can’t implement every accessibility practice, for example, the hope is that at least you’ll understand where, and how, you can improve later on down the road.

This guide also isn’t intended to be the final word on EPUB, as EPUB is always evolving. It’s about preparing you for producing EPUB 3 content using all the features it makes available, helping you avoid known pitfalls, and giving you a heads up on the issues you’ll face. If successful, it will also hopefully enlighten you to why the specification is defined the way that it is. A specification is just an artifact of agreement on how to implement a technology, after all. It tells you what the creators decided you must and should and may do—and not do—but specifications don’t spend time retelling you the story of why.

It doesn’t mean you’ll agree with all the decisions that were made, but specifications by nature portray a myth of homogeneity. It’s the discussions and debate that continue around EPUB that keep it at the forefront of ebook technologies.

If we’ve done our job writing this book, you should not have new ideas for your own production, but be well equipped to join in the discussions on the future.

The Future

By the time this book comes out, the EPUB 3 specification will be more than a year old. It’s hard to believe how fast time flies, but it’s not surprising that technology is only just catching up to the standard. That was a goal of the revision after all: to position the specification so that features and best practices could be defined ahead of the pack instead of trying to constantly play the catch-up game.

The modular nature of the specification has also proven its worth. Since the specification was published in October 2011, IDPF subgroups have published two new documents: fixed layouts and advanced adaptive layouts. Work on grammars for marking up indexes and dictionaries has been ongoing since the beginning of 2012, and a new group dealing with hybrid layouts is also in the process of being chartered. The IDPF is continuing to work with its members to evolve the standard to meet their needs; it’s not sitting on its laurels or creating a format by fiat.

Another major revision of the standard is not on the horizon at this point, but minor revisions are anticipated to add new CSS functionality, fix bugs, and see if consensus can be found on open issues like codecs and metadata. A new minor revision is expected to begin as this book gets readied for print, which will effect the information in this guide, but it’s anticipated only for the positive.

You may have RDFa and microdata for content documents by the time you read this, for example, or at least a firm promise of them. Fixed layout support could be stronger if the information document it’s currently defined in gets rolled into the main specification. The HTML5 landscape should be clearer, too, as the W3C pushes to finalize the standard by 2014. EPUB 3 itself also is hoped to become an ISO Technical Specification during the process.

But don’t worry that this means you’re going to be fed lots of point-in-time ideas. The areas of instability are not that numerous, and the practices that exist solely to deal with them are clearly marked. The point of this book is to look at the core of the standard, so the information should stand for as long as EPUB 3s are being produced.

And even as we began wrapping up this book, a new project to create a conformance test suite for reading systems was announced, which will help standardize rendering across reading systems, more and more of which are appearing that support EPUB 3 content. In natural step, publishers are also announcing their plans to start releasing content (the Hachette Book Group, for example).

EPUB 3 is here, now, in other words.

But we’re not here for long-winded introductions. Let’s get on with the show!

How to Use This Book

Although you can read this book cover to cover, each chapter contains information about a unique aspect of the EPUB 3 format allowing them to also be read in isolation. To simplify jumping through the content, here’s a quick summary of the information in each:

Introduction
The introduction provides a brief, high-level overview of the EPUB format and specifications. If you’re coming to this book with no background in EPUB production, this chapter will get you grounded before you head into the details.
Chapter 1: Package Document and Metadata
The first chapter introduces the package document at the heart of every EPUB and walks you through the process of adding publication metadata. The structure of the package document is reviewed, as is the required publication metadata. The new, flexible model for adding metadata to publications via meta elements is also introduced.
Chapter 2: Navigation
This chapter details the new EPUB navigation document, including how to construct the required table of contents and optional landmarks and page list navigation aids. It also shows how the document can now double as content in your publication, removing the need to have two documents for the same basic function.
Chapter 3: Content Documents
This chapter is more wide-ranging in scope, as it provides a general overview of content documents. It reviews the new features and requirements of XHTML5, from the new additions to the core HTML grammar to the inclusion of MathML and SVG. It also reviews the new epub:type attribute for semantic inflection. EPUB style sheets, alt style tags and other styling issues are also covered. The chapter concludes by looking at the various fallback mechanisms at your disposal when using nonstandard content types.
Chapter 4: Font Embedding and Licensing
The ability to embed fonts allows rich typography in EPUBs. This chapter looks at the technical details involved in embedding WOFF and OTF fonts, and it also reviews the licensing issues to be aware of when you do.
Chapter 5: Multimedia
This chapter looks at the new audio and video elements in HTML5 for embedding multimedia content in your publications. It covers how to include resources, poster images, and timed tracks, as well as the issues surrounding the lack of a universal codec for video. The chapter concludes by looking at epub:trigger elements for building scriptless user interfaces.
Chapter 6: Media Overlays
Media overlays is the new technology that enables synchronized text and audio playback in reading systems, and this chapter reviews the process of creating these documents. The issues involved in creating overlays for different levels of playback granularity gets explored, as does the impact on production.
Chapter 7: Interactivity
The addition of scripting in EPUB 3 opens up a whole new dimension in ebooks. This chapter explores the scripting capabilities supported by the format, the new epubReadingSystem JavaScript property for querying reading system capabilities, and also reviews the issues you’ll need to consider when choosing to make your content dynamic. It also covers the new HTML5 canvas element.
Chapter 8: Global Language Support
To become a truly global standard for ebooks, EPUB 3 was augmented to enable more than just left-to-right page progressions and horizontal writing styles. This chapter looks at the mechanics and mechanisms for handling both right-to-left page progressions and vertical writing styles. It also reviews the new CSS additions that give greater control over such features as line and word breaking, as well as the use of ruby annotations.
Chapter 9: Accessibility
Although this book tries to keep a focus on accessibility throughout each chapter, this one delves into unique accessibility requirements for markup, styling, fixed layouts, and scripting. WAI-ARIA roles, states and properties are introduced for dynamic content, as numerous best practices for markup, many drawn from WCAG 2.0.
Chapter 10: Text-to-Speech (TTS)
One of the shortcomings of ebooks for aural readers has been the inability to control the quality of text-to-speech playback. EPUB 3 introduces three new technologies to fill this void: PLS lexicon files enable producers to create reusable phonetic pronunciation libraries, SSML markup allows specific pronunciation overrides to be embedded in the markup of a document, and the CSS3 Speech properties provide a variety of playback controls. This chapter reviews how to include all these technologies to improve the rendering on compliant reading systems.
Chapter 11: Validation
Before distributing your finished EPUB files, you want to make sure that they conform to the specifications, otherwise you run the risk of them not being usable by readers. The final chapter looks at the epubcheck validation program, including how to run it and how to understand the errors it emits.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values determined by context.

Tip

This icon signifies a tip, suggestion, or general note.

Caution

This icon indicates a warning or caution.

Using Code Examples

This book is here to help you get your job done. In general, if this book includes code examples, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “EPUB 3 Best Practices by Matt Garrish and Markus Gylling (O’Reilly). Copyright 2013 Matt Garrish and Markus Gylling, 9781449329143.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

Credits

Matt Garrish has been working in both mainstream and accessible publishing for more than 15 years. He was the chief editor of the EPUB 3 suite of specifications and has authored a number of works on EPUB 3 and accessibility, including the O’Reilly books What Is EPUB 3? and Accessible EPUB 3. He currently resides in Toronto, where he continues to work on EPUB and accessibility initiatives for the DAISY Consortium and others.

Markus Gylling has worked in the field of information accessibility since the late 90s. As CTO of the DAISY Consortium, he has been engaged in the development of specifications, tools, and educational efforts for inclusive publishing on a global scale. Markus is the chair of the EPUB 3 Working Group, and during 2011 he led the development of the EPUB 3 specification. Since October 2011, he has served as CTO of the IDPF alongside his job with the DAISY Consortium. Markus lives and works in Stockholm, Sweden.

Liza Daly is the Vice President of Engineering at Safari Books Online and an experienced developer of digital publishing and web technologies. She served on the Board of Directors of the IDPF and has published a number of articles and seminars on EPUB 2, EPUB 3, and best practices in digital publishing. Liza developed several web-based reading systems including the first HTML5 EPUB reader, and was an active participant in the OPDS ebook distribution standard. As a consultant, Liza has worked with technical, trade, academic, and educational publishers, including O’Reilly Media, Wiley, Penguin, Oxford University Press, A Book Apart, and Harvard Business School Publishing. Liza founded Threepress Consulting in 2008, which was later acquired by Safari Books Online.

Bill Kasdorf, General Editor of The Columbia Guide to Digital Publishing, is Vice President and principal consultant of Apex Content Solutions, a leading supplier of data conversion, editorial, production, and content enhancement services to publishers and other organizations worldwide. Active in many standards initiatives, Bill serves on the IDPF Working Group developing the EPUB 3 standard (he was coordinator of its Metadata Subgroup and is now active in the Indexing Working Group); the IDEAlliance working group developing the nextPub PSV source format for magazines and other design- and feature-rich publications (chairing its Packaging PSV as EPUB Committee); he is Chair of the BISG Content Structure Committee; and he is a member of the Publishing Business STM/Scholarly Advisory Board and the NISO eBook SIG. Past President of the Society for Scholarly Publishing (SSP) and recipient of SSP’s Distinguished Service Award, Bill has led seminars, written articles, and spoken widely for publishing industry organizations such as SSP, O’Reilly TOC, NISO, BISG, IDPF, DBW, AAP, AAUP, ALPSP, STM, Seybold Seminars, and the Library of Congress. In his consulting practice, Bill has served clients globally, including large international publishers such as Pearson, Cengage, Wolters Kluwer, and Sage; scholarly presses and societies such as Harvard, MIT, Toronto, ASME, and IEEE; aggregators such as CourseSmart and netLibrary; and global publishing organizations such as the World Bank, the British Library, and the European Union.

Murata Makoto (Murata is his family name) has been involved in XML for 15 years, since he joined the W3C XML WG, which created XML 1.0. As the lead of the Enhanced Global Language Support subgroup of the EPUB 3 working group, he contributed to internationalization of EPUB 3. He is a co-chair of the Advanced/Hybrid Layouts WG of IDPF and a committee (ISO/IEC JTC1/SC34/AHG4) for the planning of EPUB standardization at ISO/IEC JTC1. He has contributed to other XML activities such as RELAX NG (a schema language used for EPUB) and OOXML. He graduated from Kyoto University, and holds a Doctor of Engineering from Tsukuba University. He is the CTO of Japan Electronic Publishing Association. Makoto lives in Fuisawa-shi, Japan.

Adam Witwer has worked in publishing for twelve years, the last eight at O’Reilly Media. At O’Reilly, he created and ran the Publishing Services division, managing print, ebook/digital development, video production, and manufacturing. Along the way, Adam led O’Reilly through process and technical transitions to position the company for a digital-first world. In his current role as Director of Publishing Technology, he creates products that explore new ways to write, develop, manage, distribute, and present digital and print books. His team is currently beta testing a next-generation authoring platform.

Acknowledgments

Matt Garrish would like to thank the following people for their invaluable input while writing the accessibility chapters: Markus Gylling, George Kerscher, Daniel Weck, Romain Deltour and Marisa DeMeglio from the DAISY Consortium, Graham Bell from EDItEUR, Dave Gunn from RNIB, Ping Mei Law, Richard Wilson, Joan McGouran and Sean Brooks from CNIB, and Dave Cramer from Hachette Book Group. He’d also like to give a wide-ranging thank you to Bill McCoy and all the members of the EPUB 3 working group he’s had the opportunity to work with, and from whom he learned much of the information in this book, especially the other coauthors. He’d also like to thank John Quinlan, who foolishly acceded to his endless entreaties to join his electronic publishing department those many years ago, and dedicate his chapters to the memory of Paul Seaton, who passed away far too young during the writing. And a very special thanks goes out to the DAISY Consortium for their work fostering digital equality, and without whose sponsorship he never would have been able to undertake this project.

Markus Gylling would especially like to thank Matt Garrish for his flair for making technical concepts readable by mortals; George Kerscher for his never-ending perseverance. Also, special thanks goes to Mike Smith (W3C) and Fantasai (now with Mozilla) for invaluable help and advice during the EPUB 3 specification development.

Bill Kasdorf would especially like to acknowledge the expert leadership Markus Gylling and Bill McCoy provided and provide to the EPUB 3 working group and the IDPF, as well as the invaluable guidance they have given both to himself personally and to the many other industry groups they have graciously let him pull them into. The same goes for the technical and editorial consultation Matt Garrish has so generously contributed to some of those same groups as well as to this book and, most importantly, to the EPUB 3 spec. Finally, he is particularly grateful to the excellent team who comprised the EPUB 3 Metadata Subgroup, with particular thanks to the dedicated work and invaluable contributions of Daniel Hughes and Graham Bell.

Makoto Murata is grateful to the members of the Enhanced Global Language Support subgroup of the EPUB 3 WG as well as the editors of W3C CSS Writing Modes and CSS Text. Internationalization of EPUB 3 would not have been achieved without their significant contributions. He would like to thank the members of W3C Japanese Layout Taskforce for creating Requirements for Japanese Text Layout (W3C Group Note) and allowing the use of figures from it.

Liza Daly acknowledges the work of The Open University for continuing to push the boundaries of accessible, interactive publications, all created using an open-source toolchain. She continues to be inspired by the interactive fiction community, who have been collectively demonstrating the narrative power of nonlinear storytelling long before the EPUB format was conceived.

Adam Witwer would like to thank Ron Bilodeau at O’Reilly for consulting and running tests on font obfuscation and subsetting. Ron knows more about those topics than the entire Internet. Thanks, also, to Deirdre Silver from Wiley for speaking openly from the perspective of a large publisher. And thanks to Alin Jardin and Vladimir Levantovsky from Monotype Imaging for providing information (and great conversation) around all things font related, but especially licensing.

And a final thank you from all the authors goes to Brian Sawyer and all the people at O’Reilly for their work putting this book together!

Safari® Books Online

Note

Safari Books Online is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online offers a range of product mixes and pricing programs for organizations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and dozens more. For more information about Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://oreil.ly/epub3-best-practices.

To comment or ask technical questions about this book, send email to .

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Get EPUB 3 Best Practices now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.