I originally presented this talk on ebook markup to an audience of ebook developers and publishers. As someone who cares deeply about accessibility and discovery, it’s a subject that tends to get me agitated, but I tried to be extra-polite because my audience was Canadian.
My hope is that as web-based book resources like Safari continue to proliferate, publishers will take advantage of the opportunities afforded by many years of research by the web community into what makes content semantically rich, accessible, and competitive with the wealth of free material available on the open web.
Digital book design is a hybrid discipline of web design, classical typography, QA, XML development, and abuse of regular expressions and/or alcohol. A lot of it is, sadly, treated as a cost center and heavily commoditized and outsourced.
Naturally most developers don’t like to think of themselves as a cost, and take pride in their creativity and knowledge borne out of experience. Ebook developers are no different, with the additional pride that comes from being the latest evolution in a publishing tradition extending back thousands of years.
I want to convince you that craft in ebook design is not just consistent with the ethics of publishing, but also makes good business sense.
Since the emergence of the Kindle, publishers have largely concerned themselves with how books looked in ereaders. Looks matter, of course, but digital publications offer much more than pure visual presentation. And while many people have contributed to standards that improve the reading experience for the print-disabled, the reality has been that accessibility is an afterthought at best, or a purely theoretical concept at worst.
The outcome? An ecosystem of ebooks with purely presentational markup.
Here’s why that’s a problem.
Less style, more substance
Safari’s books are often technical, which means that they contain some challenging markup: source code, tabular data, funny Unicode characters ☃, and complex nested hierarchies of sections and subsections. Unfortunately, the actual innards of the ebooks don’t always reflect the complexity of the subject matter. We get a lot of this:
Nobody likes this kind of markup.
Not us at Safari, because it make it difficult for us to predict how ebook styles will behave in our product.
Not our customers, because it makes it hard for assistive technologies, and impedes basic functionality like cutting and pasting code samples.
Not search engines, because they can’t interpret the meaning of the content, and tell which keywords are in headers to know whether they’re important, or just part of the text.
Not publishers, because these titles aren’t as discoverable and won’t perform as well.
Inside of a book, it’s too dark to read
Think of search engines as the world’s largest community of accessible content consumers. More accessible content performs better — has better discoverability — than non-semantic markup.
Search engines only know how important the words in a document are by analyzing the markup. Headers, lists, and tables all convey importance. Contrary to popular belief, it doesn’t matter much for accessibility or discovery if you use
<strong>, but for the love of all things holy do not use
<span class="EmphItalic"> — by doing so you’ve completely buried that emphasis in your CSS.
And the surest way to hide valuable text from search engines and disabled users is to capture it an image.
SEO: Subscription Engine Optimization
Safari and most other web-based book sites employ usage-based payment models. Though the specifics vary, the more readers find books and read them, the more publishers get paid. (Our model is uniquely suited to this, because we pay publishers each time any unit of a book is read, even a single paragraph.)
Most of our users search for broad topics, like “Python” (a programming language). A book that has lots of chapters and section titles about Python is probably more about Python than one that mentions it just in the full text. Google’s algorithms favor words in titles and headers over those in paragraph text, yet many of our books have no header elements at all, relying instead on presentational markup like
<p class="header">. Those books are not going to rank as highly as those with higher-quality markup, nor will they outperform web-native material. This is sad, as books are likely to contain the most reliable, trusted, vetting material on the topic at hand.
To make matters worse, technical users like Safari subscribers search for code samples or other specialized terms. We (and our publishers) would rather those people find books in our service before they find blog posts or forum answers. But too many of our publishers’ books have source code or tables captured as images, and those are inaccessible to search engines. They also generate a constant stream of accessibility complaints to our customer service team.
What’s at stake
The current impasse between Amazon and Hachette highlights an urgency for publishers to develop a stronger presence on the web. This inevitably means exposing more of the interiors of books themselves to search engines — the place where users go to find information and entertainment. Modern businesses rise and fall (some undeservedly so) based on their visibility to Google and others. By treating book markup as purely presentational — evaluating the quality of an ebook conversion on its visual fidelity alone — many publications are at a decided disadvantage.
It’s time to close that gap.