Posted on by & filed under ebooks, epub.

The vast majority of ebooks today have print cousins, despite some recent digital-only publishing news. As a consequence, many people creating ePubs want to know how to tie references to the printed pages back into the ebook. My personal opinion is that this sort of print-centrism is unnecessary for the vast majority of titles1, but there are times when mapping the ebook to the printed book pages does make a lot of sense. Unfortunately, there’s no perfect solution at this time, but there are two options.

page-map

Adobe was motivated to provide a solution to this problem in Digital Editions before other reading systems, and they came up with a mechanism called page-map. The Adobe EPUB Best Practices Guide describes the issue:

There is no inherent linear navigation indicator which could be used for the same purpose that page number is used in the printed document world.

There is no way for an eBook to incorporate page number information for the printed edition of the same book.

Adobe developed an extension called page-map, documented in the sme Best Practices Guide, that provides a solution. To implement page-map, the creator includes a special page-map file in the ePub and references that file in the OPF metadata. This file gives a (page #) name to each pointer to a specific location within the content.

Here’s an example (note that many of the content files in the preface don’t span more than one page):

<page-map xmlns="http://www.idpf.org/2007/opf">
  <page name=""  href="strandedwithaspy_cov.html"/>
  <page name="" href="strandedwithaspy_intro.html"/>
  <page name="1" href="strandedwithaspy_fm01.html"/>
  <page name="2" href="strandedwithaspy_fm02.html"/>
  <page name="3" href="strandedwithaspy_tp01.html"/>
  <page name="5" href="strandedwithaspy_adc01.html"/>
  <page name="5" href="strandedwithaspy_ata01.html"/>
  <page name="6" href="strandedwithaspy_ded01.html"/>
  <page name="" href="strandedwithaspy_con01.html"/>
  <page name="7" href="strandedwithaspy_fm03.html"/>
  <page name="8" href="strandedwithaspy_fm03.html#page8"/>
  <page name="9" href="strandedwithaspy_fm03.html#page9"/>
  <page name="10" href="strandedwithaspy_ch01.html"/>
  <page name="11" href="strandedwithaspy_ch01.html#page11"/>
  <page name="12" href="strandedwithaspy_ch01.html#page12"/>

Pages in Action

How does page-map actually look in Digital Editions?

An annotated screenshot of a real-world use of the Adobe page-map extension

An annotated screenshot of a real-world use of the Adobe page-map extension

The catch? page-map is (intentionally) an extension to ePub and adding a page-map file to your ePub will make it invalid. On top of that, unless the reading systems is based on Adobe software (like Digital Editions and the Sony Reader), page-map will have no effect.

Pages in Digital Editions (without a page-map)

So, why do you always see these pages in Digital Editions, regardless? An un/fortunate feature of Digital Editions is the addition of the page-map-like display of pages, even if the ePub doesn’t include any page-map file. Here’s Adobe’s Best Practices again, describing how it chunks any content into a regular size, then labels each chunk a page:

When page map is not available in the document, Adobe Digital Editions will synthesize a page-map based on the document content. The approach used is the following:

Determine a compressed byte length of each resource which is referenced in the spine, subtracting any known encryption overhead (IV size)

Assume that there is a page for each 1024 bytes…

Some people want to see more of this, while others wish it could be turned off.

pageList

The NCX spec provides an alternative to the page-map extension, the pageList. This aptly named container for pagination information, provides a mechanism for giving a label (navLabel) to a point in the ePub (the pageTarget).

Here’s an example from an Internet Archive ePub:

  <pageList>
    <navLabel>
      <text>Pages</text>
    </navLabel>
    <pageTarget type="normal" id="pagetarget000006" value="6" playOrder="6">
      <navLabel>
        <text>6</text>
      </navLabel>
      <content src="part0000.html#page-6"/>
    </pageTarget>
    <pageTarget type="normal" id="pagetarget000007" value="7" playOrder="7">
      <navLabel>
        <text>7</text>
      </navLabel>
      <content src="part0000.html#page-7"/>
    </pageTarget>

The catch? While the OPF spec (part of ePub) says reading Systems must support NCX. (and mentions pageList offhandedly), it’s not at all clear how much of NCX is supposed to be supported. This ambiguity has meant that no reading systems (to my knowledge) have implemented support for pageList, so it’s appeal is primarily aesthetic. Unlike page-map, adding a pageList won’t make your ePub invalid. The EPUB Standards Maintenance Working Group is trying to clarify the NCX issue.


1 What percentage of people actually have both media in front of them at the same time? It’s unquestionably better when trying to tell someone on the phone about the hilarious double-entendre on page 294, but I wonder how often this happens as well. Why not tell them to search for would you like to check my figures?. Finally, in STM & educational content, where this sort of thing might come up often, the headings are often numbered, and serve as a better guidepost (because they don’t rely on the same trim/pagination for various international editions, etc, etc).

Tags:

5 Responses to ““Pages” in ePub: Adobe’s page-map versus NCX pageList”

  1. Lindsey Thomas Martin

    You wrote:
    ‘My personal opinion is that this sort of print-centrism is unnecessary for the vast majority of titles’.

    For the most part, I agree with you on this. There is, however, a precedent in scholarly publishing for this sort of ‘page mapping’, when it was useful to link editions of classical authors to a standard edition so passages could be compared easily. The Stephanus pagination used in editions of Plato is an example. Had the editions been digital and thus easily searched, such an expedient would have been less necessary, though it is still convenient in discussion to have a short-hand way to refer to passages from a corpus.

  2. bowerbird

    it amazes me how far down this particular road
    that you .epub people still need to travel…

    i strongly encourage you to spend some more time
    engaging in thought and discussion on the topic…

    -bowerbird

  3. Micah Bowers

    Yes, in DE the way the page number display works + the fact that you can’t turn it off is very, very unfortunate (bugs). I’m looking forward to seeing that fixed.

    BTW, I respectfully disagree about whether most books need a page map. How else could humans communicate about content locations in a reflowable document? And, If publishers don’t include it, the reading system has to generate it programatically – which is far less optimal. Caveat is the lack of consistent implementation to a standard. So it is a fair question about whether to go with the extension that at least works on Adobe SDK devices or with pagelist which may be supported in the future by some reading systems.

  4. Irwin

    The idea of a page is necessarily rooted in reference to the book as physical object. The same way the icon for email is still a paper envelope. Because we’re at a transitional period, most people need the comfort of referring to a “page” rather than a “location” as Kindle puts it. But I would like to point out that Bible readers and scholars have been citing not pages but chapter:verse for a very long time. We just need to get used to applying the same system to all electronic reading material.

    I wish browsers (really the HTML spec) would give you granularity down to the character. I smell a jQuery plugin that needs to be written…

    (BTW, I love your work, Liza. Keep it going!)