Posted on by & filed under accessibility, epub, epub3.

The first version of EPUB used the NCX format to describe an accessible, machine-readable Table of Contents. NCX had come to EPUB from DAISY’s DTBook standard and was a crucial navigational aide. Unfortunately, NCX was rarely understood and is not very human-readable. As part of the alignment with wider web standards, EPUB 3 has dropped the NCX format and encodes the same information in a specialization of XHTML, the EPUB Navigation Document.

Moving away from specialized ebook-only solutions was a big part of EPUB 3, so I am quite interested to see what these new EPUB Navigation Documents look like in the real world. It seemed like the easiest way to create a lot of them was to transform NCX files into the new format, so I’ve written an open-source (BSD) stylesheet to do just that:

ncx2end-0.1.xsl

Note: This ncx2end-0.1.xsl is alpha-quality software in the worst possible way—it probably won’t work correctly on your documents and is hard to use. It does produce apparently-valid output for the 100+ NCX files I had around, but I would not put much faith in that today. Enjoy!

If you find this tool useful enough to discover an error, please submit a bug report and make sure to attach your NCX file to improve the test suite.


EPUB 3 does strongly encourage Reading Systems to support EPUB (including NCX), but it would be nice to start seeing more experiments with EPUB 3 files to help encourage meaningful adoption.


Here’s an example of the input and output, just for show and tell. The EPUB Navigation Document looks normal and straightforward: that’s the point.

NCX in

<?xml version="1.0" encoding="UTF-8"?>
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1" xml:lang="en-US">
  <head>
    <meta name="dtb:uid" content="p9780316000000"/>
    <meta name="dtb:depth" content="1"/>
    <meta name="dtb:totalPageCount" content="0"/>
    <meta name="dtb:maxPageNumber" content="0"/>
  </head>
  <docTitle>
    <text>Moby-Dick</text>
  </docTitle>
  <docAuthor>
    <text>Herman Melville</text>
  </docAuthor>
  <navMap>
    <navPoint id="cover" playOrder="1">
      <navLabel>
        <text>Cover</text>
      </navLabel>
      <content src="cover.html"/>
    </navPoint>
    <navPoint id="titlepage" playOrder="2">
      <navLabel>
        <text>Title Page</text>
      </navLabel>
      <content src="titlepage.html"/>
    </navPoint>
    <navPoint playOrder="3" id="preface_001">
      <navLabel>
        <text>Original Transcriber’s Notes:</text>
      </navLabel>
      <content src="preface_001.html"/>
    </navPoint>
    <navPoint playOrder="4" id="introduction_001">
      <navLabel>
        <text>ETYMOLOGY.</text>
      </navLabel>
      <content src="introduction_001.html"/>
    </navPoint>
    <navPoint playOrder="5" id="epigraph_001">
      <navLabel>
        <text>EXTRACTS (Supplied by a Sub-Sub-Librarian).</text>
      </navLabel>
      <content src="epigraph_001.html"/>
    </navPoint>
    <navPoint playOrder="6" id="chapter_001">
      <navLabel>
        <text>Chapter 1. Loomings.</text>
      </navLabel>
      <content src="chapter_001.html"/>
    </navPoint>
    <navPoint playOrder="7" id="chapter_002">
      <navLabel>
        <text>Chapter 2. The Carpet-Bag.</text>
      </navLabel>
      <content src="chapter_002.html"/>
    </navPoint>
    <navPoint playOrder="8" id="chapter_003">
      <navLabel>
        <text>Chapter 3. The Spouter-Inn.</text>
      </navLabel>
      <content src="chapter_003.html"/>
    </navPoint>
    <navPoint playOrder="9" id="chapter_004">
      <navLabel>
        <text>Chapter 4. The Counterpane.</text>
      </navLabel>
      <content src="chapter_004.html"/>
    </navPoint>
    <navPoint playOrder="10" id="chapter_005">
      <navLabel>
        <text>Chapter 5. Breakfast.</text>
      </navLabel>
      <content src="chapter_005.html"/>
    </navPoint>
    <navPoint playOrder="11" id="chapter_006">
      <navLabel>
        <text>Chapter 6. The Street.</text>
      </navLabel>
      <content src="chapter_006.html"/>
    </navPoint>
    <navPoint playOrder="12" id="chapter_007">
      <navLabel>
        <text>Chapter 7. The Chapel.</text>
      </navLabel>
      <content src="chapter_007.html"/>
    </navPoint>
    <navPoint playOrder="13" id="chapter_008">
      <navLabel>
        <text>Chapter 8. The Pulpit.</text>
      </navLabel>
      <content src="chapter_008.html"/>
    </navPoint>
    <navPoint id="copyright" playOrder="14">
      <navLabel>
        <text>Copyright Page</text>
      </navLabel>
      <content src="copyright.html"/>
    </navPoint>
  </navMap>
</ncx>

EPUB Navigation Document out

<html xmlns="http://www.w3.org/1999/xhtml" 
      xmlns:epub="http://www.idpf.org/2007/ops" 
      profile="http://www.idpf.org/epub/30/profile/content/">
  <head>
    <title>Moby-Dick</title>
  </head>
  <body>
    <nav id="toc" epub:type="toc">
      <h1>Contents</h1>
      <ol>
        <li id="cover">
          <a href="cover.html">Cover</a>
        </li>
        <li id="titlepage">
          <a href="titlepage.html">Title Page</a>
        </li>
        <li id="preface_001">
          <a href="preface_001.html">Original Transcriber&#x2019;s Notes:</a>
        </li>
        <li id="introduction_001">
          <a href="introduction_001.html">ETYMOLOGY.</a>
        </li>
        <li id="epigraph_001">
          <a href="epigraph_001.html">EXTRACTS (Supplied by a Sub-Sub-Librarian).</a>
        </li>
        <li id="chapter_001">
          <a href="chapter_001.html">Chapter 1. Loomings.</a>
        </li>
        <li id="chapter_002">
          <a href="chapter_002.html">Chapter 2. The Carpet-Bag.</a>
        </li>
        <li id="chapter_003">
          <a href="chapter_003.html">Chapter 3. The Spouter-Inn.</a>
        </li>
        <li id="chapter_004">
          <a href="chapter_004.html">Chapter 4. The Counterpane.</a>
        </li>
        <li id="chapter_005">
          <a href="chapter_005.html">Chapter 5. Breakfast.</a>
        </li>
        <li id="chapter_006">
          <a href="chapter_006.html">Chapter 6. The Street.</a>
        </li>
        <li id="chapter_007">
          <a href="chapter_007.html">Chapter 7. The Chapel.</a>
        </li>
        <li id="chapter_008">
          <a href="chapter_008.html">Chapter 8. The Pulpit.</a>
        </li>
        <li id="copyright">
          <a href="copyright.html">Copyright Page</a>
        </li>
      </ol>
    </nav>
  </body>
</html>

A more complicated book might use a lot more nesting, but it’s essentially turtles the whole way down:

<li id="id2532183">
  <a href="pt03.html">III. CSS Page Layout</a>
  <ol>
    <li id="id2532206">
      <a href="ch11.html">11. Introducing CSS Layout</a>
      <ol>
        <li id="id2532230">
          <a href="ch11.html#types_of_web_page_layouts">Types of Web Page Layouts</a>
        </li>
        <li id="id2532537">
          <a href="ch11s02.html">How CSS Layout Works</a>
          <ol>
            <li id="id2532706">
              <a href="ch11s02.html#the_mighty_div_tag">The Mighty &lt;div&gt; Tag</a>
            </li>
            <li id="id2532968">
              <a href="ch11s02.html#techniques_for_css_layout">Techniques for CSS Layout</a>
            </li>
          </ol>
        </li>
        <li id="id2533080">
          <a href="ch11s03.html">Layout Strategies</a>
          <ol>
            <li id="id2533169">
              <a href="ch11s03.html#start_with_your_content">Start with Your Content</a>
            </li>
            <li id="id2533229">
              <a href="ch11s03.html#mock_up_your_design">Mock Up Your Design</a>
            </li>
            <li id="id2533293">
              <a href="ch11s03.html#identify_the_boxes">Identify the Boxes</a>
            </li>
            <li id="id2533412">
              <a href="ch11s03.html#go_with_the_flow">Go with the Flow</a>
            </li>
            <li id="id2533461">
              <a href="ch11s03.html#remember_background_images">Remember Background Images</a>
            </li>
            <li id="id2533602">
              <a href="ch11s03.html#pieces_of_a_puzzle">Pieces of a Puzzle</a>
            </li>
            <li id="id2533688">
              <a href="ch11s03.html#layering_elements">Layering Elements</a>
            </li>
            <li id="id2533747">
              <a href="ch11s03.html#dont_forget_margins_and_padding">Don't Forget Margins and Padding</a>
            </li>
          </ol>
        </li>
      </ol>
    </li>
    <li id="id2533808">
      <a href="ch12.html">12. Building Float-Based Layouts</a>
      <ol>
        <li id="id2534262">
          <a href="ch12.html#applying_floats_to_your_layouts">Applying Floats to Your Layouts</a>
          ....

Thanks to Dave Cramer for producing one of the first EPUB 3 test files.

Tags:

14 Responses to “Transforming NCX into EPUB 3 Navigation Documents”

  1. Gerald

    Hi Keith, thanks for the stylesheet. I’m delighted to see the NCX file going, but also wondering what this will mean for cross-device compatibility.

    For example, are ePub 3.0 readers expected to reject NCX files referenced via an entry

    in the manifest?

    -Gerald

    PS I might be alone in this, but personally I think it’s a shame that we still have a nested structure. The HTML document will allow H5 within H2, so why does the TOC file insist on correct nesting? Of course it’s no problem normalising the structure at render time – my packager does just that – but it’s the kind of thing that makes the format just a little more fragile.

  2. Keith Fahlgren

    @Gerald

    The correct behavior for EPUB 3 Reading Systems and NCX is outlined explicitly in the NCX Superseded section of EPUB Publications 3.0, but it boils down to “they can co-exist without conflict” and new Reading Systems MUST ignore NCX if the new one is present.

    As far as nesting, I think you’re completely insane to suggest that meaningful grouping without structure is “no problem”, but that may be because I come from a background of technical books.

  3. Gerd

    Hi,

    thanks for providing the sample docs.

    Unfortunately i haven’t found the time to dig deeper into the epub3 specs (yet) but the sample provides an indication that epub3 decided to go with epub-namespace attributes e.g. e epub:type=”toc” instead of using the extension-mechanism that HTML5 provides via data- attributes, e.g. data-epub-type=”toc” or some other more or less established extension mechanism like microdata or rdfa)

    Can you give a short summary of the rationale behind that decision (or point me to the respective discussion)

    Thanks

    Gerd

  4. Gerald

    @Keith
    Thanks for the link. I’m sure you’re right and it’s insane to push against deep nesting. I’m glad to be able to report that our XML is very deeply nested and our eBooks don’t permit incorrect nesting either.

    -Gerald

  5. Peter Sefton

    I’m with @Gerd in wondering about whether the means of labelling the TOC could bemade more microdata-like and aligned with schema.org or RDFa. Schema.org seems to use the ‘vocab’ attribute rather than ‘profile’.

    There’s a clear use case where this would be really useful, in compiling EPUBs from web resources. I wrote about an attempt to do this
    using RDFa and the OAI-ORE spec. That approach is horribly complicated – it would be much simpler to use the nice simple EPUB 3 convention.

  6. Peter Sefton

    @Keith –
    The W3C EPUB distiller doesn’t find any RDF in your example, just the namespace declarations. I think it is because the ‘epub:type’ attribute is not part of HTML.

    I would have expected to see something like typeof=”epub:toc” or more usefully for web programmers using tools like JQuery, typeof=”http://www.idpf.org/2007/ops/toc”.

  7. Robert Nagle

    Keith, have you tested this against any docbook-generated ncx’s?

    Pardon my stupidity, but why would a conversion script from ncs to epub 3 navigation be alpha? Shouldn’t it just work? (What kind of things for example would cause it NOT to work?)

    • Keith Fahlgren

      @Robert:

      I have tested it against DocBook-XSL generated NCX files, but I haven’t reviewed the results extensively. I have no idea what would cause it to not work.

  8. Ian Martin

    Hi
    not sure if this is related but LULU seem to have a hang-up with NCX. Despite having my ebook EPUB thoroughly checked they have rejected it twice on the basis of “NCX not accurate”. Lulu is one of only a few non-USA portals to the iBookstore so this sort of blockage has significant consequences. I assume they’re not using EPUB3. Is this a problem anyone has encountered before??
    Regards
    Ian

  9. ecopeter

    I wanted to test the stylesheet, but unfortunately the link for downloading it doesn’t work.
    Thanks for you valuable job at threepress
    Pietro

  10. Keith Fahlgren

    @ecopeter: Try telling your browser to Save As… It’s probably trying to interpret the XSLT file rather than downloading it.

  11. dgatwood

    I’m of two minds regarding nested structures. On the one hand, I’m a little surprised that the EPUB format didn’t do *more* nesting than it does on the content side. The entire notion of having H1, H2, etc. just doesn’t sit well with me because the presentation breaks as soon as you move content around. Something like DocBook is a lot less fragile because the tag names for nested sections, titles, etc. are not tied to their position in the hierarchy.

    That said, I’m not entirely convinced that this new TOC is a good solution. The redundancy in this format was absurd before, and this just adds to the problem. For most books, a strict subset of the same basic information is effectively repeated five times:

    1. The manifest.
    2. The spine.
    3. The TOC NCX.
    4. The new-style TOC file.
    5. The HTML version of the TOC that’s part of the book’s frontmatter (if applicable).

    Why not do something more sane like

    Where toclevel indicates hierarchical depth, a missing toclevel means “Don’t show in the TOC”, readingorder represents the spine order, and a missing readingorder means “Don’t include in the normal flow”. This gives the exact same information as all of the TOCs, spines, etc. put together (except the actual, styled frontmatter TOC), but does so in a single place, with no redundancy, no cross-referencing between multiple tables, and generally a lot less mess. It’s more human-readable, and can be trivially converted into any of the other forms.

    Just my $0.02.