Posted on by & filed under ebooks, epub.

Obviously I’m a fan of the ePub format. It’s flexible enough to support advanced publications, but a simple text ebook can be put together with minimal effort.

But I don’t think it’s minimal enough. If I could go back in time and be involved with ePub and its predecessors, here are the choices I’d make:

Make the NCX optional

Many books or book-like publications have no chapters. In this case ebook authors are forced to create useless one-item NCX files and invent fake chapter titles, like “Pages.” Reading systems should be able to rely on the opf:spine alone to order OPS documents without defined, named chapters.

Simplify the NCX

Good progress is being made in the EPUB Working Group towards clarifying and simplifying the NCX requirements. (Making playOrder optional is an especially useful step.)

But because the NCX is derived from the DAISY specification, there’s still some useless overlap, like the duplicated title. NCX is great for complex hierarchies, but I wish its features were simply a part of the OPF file, leaving only one file with publication-specific metadata.

Drop container.xml and replace with a required name and location for the OPF

I don’t personally understand the choice that was made here: there’s a file with a required location and name (META-INF/container.xml) whose sole purpose is to point to another file which may be named arbitrarily. Perhaps this is a historical artifact, but why not simply require there to be a content.opf file at the top level of the publication and be done with it?

(When combined with my first change, this would result in ePub requiring half as many files, which to me is a good thing.)

Support any valid form of XHTML

XHTML 1.1 was obviously a mistake, as it’s used (as far as I know) nowhere else, and is a dead-end as far as web technologies go. Few automated HTML tools generate it, and the changes from XHTML 1.0 are simply annoying rather than useful.

I’d prefer that ePub support XHTML 1.0, which is simply HTML 4.01 with an XML vocabulary. HTML 4 is the dominant form of HTML on the web (and will remain so for some time) and common automated tools like Tidy can clean up “street” HTML 4 into XHTML 1.0 quite well. Tidy won’t, however, produce XHTML 1.1.

I also don’t want to put an upper bound on the XHTML supported: XHTML 5 should also be okay, and the rules for a reading system which don’t support later tags should be the same “ignore and move on” that has worked well on the web.

Minimize or eliminate any ePub-specific styles and markup

I dislike the existence of special style properties like the oeb-* styles where equivalent CSS3 properties exist. I know CSS3 is a mess, but I’d rather use the same vocabulary as will eventually be found on the web. (This was discussed but isn’t happening.)

Support MathML as a first-class document type

This will happen eventually, but who knows the timeline. At least the fallback system means that MathML documents are currently allowable; it’s just more of a hassle and many people don’t realize it’s possible.

How would you change the spec if you could magically make it so?

Tags:

8 Responses to “What I’d change about ePub”

  1. Dave Cramer

    I wholeheartedly agree with all of this. Just making playOrder optional would make my life so much easier… but if I had magical powers, I’d use them to make reading systems fully support the ePub specification. I’m looking at you, Kindle!

  2. Peter Sorotokin

    Well, looking at it from the implementer’s perspective, I’d use my magic powers somewhat differently. Of course, EPUB is what it is and none of them are realistic because of the existing content.

    1. CSS layout and good typography just don’t go together. Don’t get me wrong, it is a huge improvement over plain HTML, but it is nowhere near for what is the minimal acceptable level for printed books. The most frustraiting part from the implementation standpoint is that you have to implement all sorts of the functionalities and complexities in the engine, but CSS exposes them in handicapped fasion that makes impossible to make a good use of them. For instance you have automatic numbering for lists, but it cannot be used for chapters and chapter references. Line hight calculation is a total mess which means that line gap is CSS in unpredictable for any complex case. CSS3 only makes things worse, as far as I can tell – just look at the paged media – it is absolutely pathetic!

    2. CSS cascade is expensive and useless. If you have p element with class foo you can address it as p, .foo or p.foo and multitude of even more obscure ways. Unfortunately people do that and drain their device batteries repeating the same cascades over and over again. Styling system should have been simple and efficient.

    3. Remove all presentational HTML attributes. XHTML 1.1 does it to some extent, but all attributes like align or frame just duplicate CSS and should not have been used.

    Now, if I put my authoring hat for a moment, I would:

    1. Add XSL:FO and XSLT support to process arbitrary XML markup. CSS has very little traction in pBooks for a reason. Perhaps invent a technology similar to XBL which could be used to define custom tags instead of defining completely custom grammar.

    2. Add styling (or add CSS extensions) so that we can mimic word processing programs style inheritance. (CSS cascade idea is just a bad hack from authoring perspective as well and IMHO should be avoided. I cannot image an intuitive WYSIWYG program that would expose CSS cascade to the user)

    3. Have a way to do conditional and dynamic styling. CSS3 does conditional styling to some extent (media queries) and dynamic styling (using calc “function” in values and units), but these are not powerful enough and introduce two new (and different) expression languages. I’d rather do XPath.

  3. Bob DuCharme

    I understand that it’s a drag that apps don’t create XHTML 1.1, but remember that that point of 1.1 was to make life easier for specs like EPUB, because they can more easily specify that they only support a specific subset of HTML by just naming the modules. If they had to name each individual HTML feature that has no place in EPUB, that would be very unwieldy, especially at upgrade time, and would be virtually impossible to validate.

    That’s why any software or specs get modularized: to more easily identify blocks of features that should or shouldn’t be used in different contexts.

    Unless you think that EPUB should support frames, client-side image maps, server-side image maps and all those other XHTML 1.0 features…

  4. Greg Schofield

    Except as references to printed versions, pages have no logical place in EPUB.

    We should have paragraph numbers divided into chapters and other divisions for finer navigation (grouped into 10s; i.e. 1-10, 11-20 etc.,). At least that could be tied into a full electronic reference system — edition ID:1.10.

  5. Magnus Rudolfsen

    As I understand the NCX is required, but the validator tells me my book is valid even if NCX is missing. Do you know something about this issue?

  6. Liza Daly

    Yes, earlier versions of the validator had a bug that didn’t flag the absence of an NCX: http://code.google.com/p/epubcheck/issues/detail?id=29&can=1

    Unfortunately there hasn’t been a newer “official” build of the validator since 1.0.3, so that is what is in use on the validation site still. You could get the correct behavior by running your own version from svn locally.

  7. Ben

    The container.xml file actually has a purpose. You can put multiple renditions of the same book, for example the normal OPS xhtml stuff AND a PDF rendition, in the same epub file. I don’t know if any epub has ever used this, or if any reader actually supports it, but it’s there for that reason.

    I’d personally get rid of the damned NCX all together and add a simple multi-level TOC section to the OPS file after the spine. Something like:

    Also, there needs to be a way to create asian style vertical R-to-L text.

    Finally, the epub spec should require that all reading systems be able to display the full Unicode character set by default. I hate having to embed my own Japanese fonts in an epub to use it on my Sony Reader. It just makes the epubs bloated (like 14mb each…13mb which is the font).