Posted on by & filed under epub, kindle.

Edited 3:15pm: Though the current epubcheck considers the sample below to be valid, the approach described in this post is likely not strictly valid according to EPUB 2.0.1. The XHTML TOC is not necessarily meant to be part of the EPUB publication as it is for Kindle consumption only, but it is included in the manifest. Proceed with caution.

Many publishers prefer to create only a single EPUB file that can be distributed both into EPUB reading system channels like iBooks and B&N, but can also be distributed in the Kindle ecosystem after conversion by kindlegen, Calibre or Amazon’s internal systems.

There are a few problems with this. One is that Mobipocket’s primitive HTML support conflicts with the requirement that commercial EPUBs be valid XHTML 1.1. There are some design elements that are difficult to do “right” in Mobipocket while still maintaining compliance with EPUB. Our best recommendation in this case is to optimize the design for EPUB reading systems that are based on browser engines; that is clearly where the next generation of EPUB readers are going, and ebooks should be designed as much for future readers as for the state of the art today. Designers who need fine control over the look of Kindle books should first buy Kindle Formatting by Joshua Tallent, and then be prepared to produce two separate EPUB files: a Mobi-specific one and a valid one for distribution in the other channels.

If exact formatting isn’t critical, then the single-source approach can work, but there’s one other layout issue: Kindle requires both the EPUB NCX Table of Contents as well as a table of contents marked up in XHTML. This requirement is silly — an XHTML TOC should only be included if the creator really wanted custom layout — but there it is.

This leads to content creators including an XHTML TOC in their EPUB, which is unnecessary in EPUB readers, takes up valuable front-matter space, and impedes the flow of reading.

Here’s a quick trick to bundling an XHTML TOC and “hiding” it from an EPUB reader: omit the XHTML TOC from the spine.

Sample OPF:

<?xml version="1.0"?>
<package xmlns="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" unique-identifier="bookid" version="2.0">
  <metadata>
    <dc:title>Test demonstrating mobi TOC that does not appear in an EPUB</dc:title>
    <dc:creator>Threepress Consulting Inc.</dc:creator>
    <dc:description>A demonstration of including a Mobi-friendly XHTML TOC in an EPUB</dc:description>
    <dc:date>2010</dc:date>
    <dc:rights>Public domain</dc:rights>
    <dc:identifier id="bookid">urn:uuid:AB7456FF-DDC7-4DB1-AEFB-153DDDBA9F9B</dc:identifier>
    <dc:language>en</dc:language>
  </metadata>
  <manifest>
    <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
    <item id="untitled-5" href="Untitled-5.xhtml" media-type="application/xhtml+xml"/>
    <item id="toc" href="toc.xhtml" media-type="application/xhtml+xml"/>
    <item id="css" href="template.css" media-type="text/css"/>
  </manifest>
  <spine toc="ncx">
    <itemref idref="untitled-5"/>
  </spine>
  <guide>
    <reference href="toc.xhtml" type="toc" title="Table of Contents"/>
  </guide>
</package>

In this case, the guide entry tells Kindle where to find the XHTML TOC, but omitting it from the spine means that it is not added to the “linear reading order” of the book. It’s simply not accessible to any EPUB reader.

Thus you have a Kindle-friendly EPUB file that does not burden its EPUB reader cousins with unnecessary cruft.

A complete EPUB sample is attached.

Tags:

26 Responses to “Better single-source Mobi/EPUB files”

  1. Nate

    I prefer to leave it visible. I’ve seen a number of e-readers with poor Epub implementations. They adequately display the ext, but they don’t support some of the particulars like the external TOC.

  2. Moriah Jovan

    I also prefer to leave it there. Yes, it takes up “valuable” front matter space (but then, it could be put in the back), but it is available for the reader’s convenience. My biggest concern is what happens when the reader program doesn’t have a ToC function? You can’t assume that all EPUB readers are going to have a facility that will compile the ToC as part of its process, and leave the reader hanging with no ToC at all JUST IN CASE s/he is using one that doesn’t.

    There’s no harm in putting possibly necessary stuff in. The harm is in taking possibly necessary stuff out.

  3. Liza Daly

    My biggest concern is what happens when the reader program doesn’t have a ToC function

    I’m not aware of any significant EPUB reader that does not display the NCX TOC.

  4. Rebecca Springer

    The need for a TOC in the front matter depends on the book. For many if not most nonfiction titles, an inline TOC is a valuable orienting mechanism for the reader. Conversion houses or other technologists should not be the ones deciding which parts of the content are worthwhile — it is an editorial decision.

  5. Liza Daly

    Absolutely agree that it’s an editorial decision. Many publishers do want a single EPUB that can go to Kindle but don’t want the XHTML TOC. Publishers who do want the XHTML TOC in both versions can safely ignore this approach.

  6. bowerbird

    i too think the t.o.c. at the front is a valuable orientation,
    rather than “unnecessary cruft”. i don’t see why .epub
    needs that extra file generated. that, to me, is the cruft.
    the table of contents should be contained within the flow.

    but you’re right when you say that mobi, which requires
    that _two_ such unnecessary files be generated, is silly.

    i also question your recommendation to create 2 separate
    .epub files — one tailored for .mobi and one for otherwise —
    but that is another argument for another day, since that
    is not the main topic of this article (even though it _was_
    the headline that you gave to this article, i’m not sure why).

    -bowerbird

  7. Liza Daly

    i also question your recommendation to create 2 separate
    .epub files — one tailored for .mobi and one for otherwise

    As I said, this is only a requirement if you need to produce hand-crafted XHTML for Kindle. It uses a non-standard HTML parser that does not behave the way web browsers/EPUB readers do.

    Under most circumstances our recommendation is to produce just one EPUB.

  8. IcompetenceIsNotAnExcuse

    From the OPF specification ( http://www.idpf.org/doc_library/epub/OPF_2.0.1_draft.htm#Section2.4 )

    “All OPS Content Documents that are part of the publication (i.e. are listed in the manifest) which are potentially reachable by any reference mechanism allowed in this specification must be included in the spine. Such reference mechanisms include, as a partial list, hypertext links within OPS Content Documents, and references by the NCX, Tours and Guide. “

  9. Moriah Jovan

    I’m not aware of any significant EPUB reader that does not display the NCX TOC.

    I’m not thinking about the “significant” ones. I’m thinking about the outliers. Really, does it take up so much room that it has to be taken out just…because? Seriously, I don’t get it.

    On the other hand, since I do hand-craft them separately every day, I will say that tweaking the EPUB from the MOBI is far, far easier than the other way around, so for my workflow, taking the in-text ToC OUT is an unnecessary step.

  10. fantata

    I’m with Liza, it seems to me like it’s only the kindle that doesn’t use the toc.ncx. I think it’s reasonable for a book file structure to have a distinct mechanism for the TOC rather than rendering XHTML for it. I don’t think it’s a massive issue, but I stumbled on the method Liza points out when reacting to kindle users complaints (rightly so) and adding the XHTML toc. This keeps me happy as I really don’t want it inline where the TOC ncx is utilised, even if it’s not exactly valid. I feel it *is* cruft.

    So far we’ve been going for the one ePub fits all approach, but this week we’ve started to make the move over to a very simple stylesheet for .mobi as the kindle renders a lot so poorly – and whilst we want to provide the kindle user-base (which lets face it, is the bigest so far) with a nice, clean experience, we also really want to start making things look as good as possible for webkit etc. based systems, which surely must be the future.

  11. bowerbird

    fantana said:
    > I think it’s reasonable for a book file structure
    > to have a distinct mechanism for the TOC
    > rather than rendering XHTML for it.

    i think requiring 2 (or more) files for an e-book
    when _one_ is more than sufficient is not just
    doubling the number of files hanging around,
    it’s an invitation for trouble to come visiting…

    surely we don’t need a separate table of contents
    to be able to figure out the headers in the book…

    and if a table of contents is already _in_ a book,
    there’s no reason to duplicate it as a separate file.

    > we also really want to start making things
    > look as good as possible for webkit etc. based
    > systems, which surely must be the future.

    lots of people seem to be saying that, but i’d say
    that’s a huge mistake. let’s write an engine to do
    e-book rendering as a completely separate entity,
    rather than waiting for browser-programmers to
    adapt their bloated monstrosities to our desires…

    -bowerbird

  12. Liza Daly

    Joseph,

    Keith also suggested linear=”no” and theoretically that’s what it’s for, but EPUB readers aren’t required to do anything with that attribute and most just render the OPF content as if it were linear=”yes”.

  13. Joseph

    True, and a number of older reading systems don’t observe it. But it seems to me we should fix the problem at its source — which is either encouraging reading systems to observe it, or mandating it in the spec. Book designers (and especially developers of EPUB generators) will likely resist creating invalid EPUBs. And one of the most vocal complaints I hear from readers is having to wade through all the front-matter guff just to get to the text. To my mind, this is one of those bad authoring practices that holds good reading systems back. Your solution is very welcome, but the validity problem is going to nix it, I think.

  14. Joseph

    Oh, putting it last as well as non-linear is a good idea. I think I’ll update Peregrin to do that.

    Now, what about those godawful cover pages? Same?

  15. fantata

    Hmm, maybe last and non-linear is the way to go then. I tried non-linear on it’s own, but like you say it’s not well supported.

    Joseph – by cover pages do you mean the copyright, legal nonsense etc.?

  16. Joseph

    No, I mean the component, typically a single page, typically containing a single image, which is identical to the cover image, often embedded in SVG (for extra WTF) at the very start of many EPUBs. Modern reading systems have better ways of displaying cover images than within the margins of a page — this practice is an unfortunate relic (as usual, I blame Digital Editions).

  17. Eping Wang

    It seems Kindle adopts a grammer like HTML 3.2,
    and ePUB adopts XHTML1.1,
    but currently most used HTML 4.0 was left out,
    gap or vacuum?

  18. bowerbird

    joseph said:
    > And one of the most vocal complaints I hear from readers is
    > having to wade through all the front-matter guff just to get to the text.

    and that’s why i like the table of contents right behind the title/cover,
    a convenient place giving maximal navigation possibilities from there.

    so i’d inform those readers of the t.o.c., and advise ’em to use it, especially
    when they want to skip over something instead of “wading through” it…

    -bowerbird

  19. Tom Semple

    Amazon’s Kindle Publishing guidelines state ‘please ensure the HTML TOC is located towards the start of the book, and not the end of the book’. That’s probably because this ensures TOC is contained within book samples so that customers can make more informed purchase decisions.

    The NCX is used only for chapter navigation on Kindle now, but I think a best practice would be to assume it can be used as it is in ePub, and indeed, as it can be used in the Kindle Previewer application. That is to say, each navpoint needs meaningful chapter labels, and I would not flatten the hierarchy (even though 2nd level navpoints are not apparent in the UI, and it is not straightforward to navigate to them). If Amazon ever decides to improve navigation features, one obvious thing to do would be to add NCX navigation as in Kindle Previewer. Besides if you are preparing content for both mobi and epub, it is less work if you can use the same NCX for both. As many publishers evidently do.

    XTHML and NCX TOCs fulfill different functions IMO and may therefore diverge in terms of the items presented in each. The editorial role is to make sure NCX facilitates ease of navigation, and XHTML facilitates disclosure of content (along with some less efficient navigation). I would not consider either one to be ‘kruft’.

  20. bowerbird

    tom said:
    > XTHML and NCX TOCs fulfill different functions IMO and
    > may therefore diverge in terms of the items presented in each.
    > The editorial role is to make sure NCX facilitates ease of navigation,
    > and XHTML facilitates disclosure of content (along with some
    > less efficient navigation). I would not consider either one to be ‘kruft’.

    well, good, tom, because, in my opinion, they are _both_ cruft.
    so that gives us the basis for a little discussion here, eh?

    i can, from the markup, determine the exact hierarchical structure
    of the headers within the document. i can then use this knowledge to
    bring about any “role” i might need, whether it is “ease of navigation”
    or something that “facilitates disclosure of content”, whatever that is.

    so for what, exactly, do i need those two files?

    -bowerbird

  21. bowerbird

    does adobe digital editions still only
    support files that are under 300k?

    and is there a simple f.a.q. page
    on the .epub file-format where
    questions like this can be asked
    and answered?

    -bowerbird