Posted on by & filed under ebooks, epub.

There’s starting to be a wonderful consequence to the growing excitement about ebooks in the wider world: we now have the ideas, people, technology, and commercial incentive to finally start to solve some of the tricky and tempting issues facing digital reading. But many of those solutions depend on a fundamental piece of the architecture that is still missing. We need to define a way to link to an ebook and link inside that ebook.

[Update: I now think the initial // is probably a mistake and should be removed.]

[Update2: I now think this entire proposal is flawed, most critically by the fact that URI scheme registration is just too high of a burden with too many costs. Instead, please see A simpler EPUB Linking proposal and the work on the epub-revision wiki.]


The EPUB3 F2F just concluded yesterday in San Francisco and the Internet Archive’s Books in Browsers 2010 conference will start in a few hours, which brings the entire spectrum of ebook thinkers closer together than they’ve ever been. In some ways, the IDPF and EPUB Working Group hold a more conservative position but benefit from a huge collective history with ebooks. On the other hand, many of those attending Books in Browsers are making such an impression expressly because they’re outsiders joining the party with groundbreaking, disruptive ideas. Both groups are pining for ebook linking.

The EPUB community has long regretted the absence of a linking specification, but this meeting made it clear that the EPUB3 work on metadata, dictionaries, and annotations all are completely dependent on a linking spec. Similarly, innovative projects like Open Bookmarks must be built on top of interoperable links.

My own yearning for this work started in hallway discussions with other publishers and reading system implementors at the O’Reilly TOC 2008 conference. When I realized I’d been talking about it for more than 30 months, I decided to stop talking and start writing…

Caveat: All that follows is just one person’s quick attempt at a synthesis of a variety of viewpoints and conversations. It’s deliberately incomplete and probably fundamentally broken, but should at least get the work kickstarted.


The EPUB Linking specification defines a valid Uniform Resource Identifier (URI) syntax for identifying and referencing EPUB documents and, optionally, content inside those documents. This syntax is based on the scheme epub:.

Unlike the web, EPUB documents have different document longevity expectations. This requires a set of techniques that extend the set of capabilities of existing solutions like Uniform Resource Locators (URLs).

This specification includes a set of techniques for expressing and verifying Strong Links as well as a set for Loose Links.

[TODO: List of things NOT in scope, partially to make it clear this is not trying to boil the ocean along with every other grand linking scheme of the last two decades.]


Unlike other linking techniques, the EPUB Linking specification tries to provide both:

  • explicit, fragile, verifiable links to an exact document or content (Strong Links)
  • fuzzy, degradable, and robust links to any instance of a similar and/or updated document or content (Loose Links)

Expressed as a set of things readers, content creators, or reading system creators might want to do:

  • A book review service creates a link from a review to any copy of One Thousand and One Nights that might be available.
  • A reader creates a bookmark on the last page of a book and sends the link to that page to a friend reading the same book.
  • A person on a train opens a link from an email to a specific time in a section of recorded audio from an ebook (or an exact position in an audio overlay of text content).
  • A researcher cites a specific section in an historical version of a document and adds the data to specifically fingerprint that section and version to ensure an exact match in the future.
  • An author or publisher includes a link from one title in a series to a specific section in a previous title to provide context.

EPUB Links MUST be valid URIs.

EPUB Links MUST be able to be created that reference valid EPUB documents and valid EPUB3 documents.

An EPUB Link

An EPUB Link is made up of a scheme (epub:), a Document Identifier (on the left), and an optional Target Identifier (on the right following the first /, the Separator):

A diagram of the parts of an EPUB link

This syntax roughly follows a common syntax for representing hierarchical relationships defined in RFC2396:

Three valid EPUB Links are:

a deliberately ambiguous link to the top-level of a document that could be created without the linked document in hand
a link to the top level of a specific EPUB
a link to a particular <p> in a specific EPUB

Document Identifiers

Document Identifiers identify an EPUB document. They are created by percent encoding the value of the EPUB’s <dc:identifier> that is the OPF’s unique-identifier. All slash / characters MUST be percent-encoded as %2F.

In the absence of [Strong Link stuff TBD], Document Identifiers may use [techniques TBD] to perform fuzzy matching. The simplest technique is to match as much as possible starting from the leftmost character [We’ll probably have to define a point a reserved character for deliberately degradable Links with a known number of explicit components in the Document Identifier].

[Aside: This is one site where interesting innovation will happen. There will be some good ideas about how to create Document Identifiers that are straightforward to fuzzily match (perhaps across multiple revisions of a work or different translations of the same work) and also techniques to provide more details for establishing Strong Links in legal, scientific, or other domains.]

Document Identifiers are required for all EPUB Links.


If an EPUB Link includes an optional Target Identifier, the slash / character is included between the Document Identifier and Target Identifier.

Target Identifiers

Target Identifiers identify an set of content inside EPUB document. The most straightforward Target Identifiers follow HTML and are based on a path, filename, and file extension component and a fragment identifier. All Target Identifiers using paths MUST include the absolute path from the root of the EPUB ZIP container with the initial slash omitted.

Target Identifiers are optional.

Filenames MAY reference XML or non-XML (potentially continuous media or other file types) files.

[Aside: This is the other site where I know interesting things will happen. There will be techniques for doing text-based matches, XPath-based matches, etc, etc…]

Strong Links

Strong Links are used when the link author wants to assert that only an exact match to the provided link is acceptable. They include additional information to help consuming systems verify the quality of the link with more precision.

Strong Links include [syntax technique TBD], which asserts that they MUST exactly match the Document and/or Target Identifiers before being used.

Loose Links

Loose Links are the default when [syntax technique TBD for Strong Links] is not present. Consuming systems MAY choose a variety of techniques to inform the user that an exact match was not found but lower quality matches were found.


We want to create a Link to HTML5 For Web Designers. We unzip the EPUB and open the OEBPS/content.opf file and look for the value of the unique-identifier attribute on the root <package> element. It is bookid. We find the corresponding <dc:identifier>:

We take the value, urn:isbn:9780984442515, and percent encode (or “URL encode”) it to create our Document Identifier:

Our Link is finished after we add epub:// to the front:

That was so wonderful that we decide to blog about the nice things said about Remy Sharp in the 6th chapter. We can use the same Document Identifier, but we want to add a Target Identifier pointing to the specific paragraph. We open up the XHTML for chapter6.xhtml and find the part we were looking for:

We’re lucky! The publisher has included some extra anchors with id attributes for their index, so we can get quite close with our reference. To create our Target Identifier, we need to get both the complete path and filename for the 6th chapter’s XHTML file inside the ZIP itself (OEBPS/chapter6.xhml in this case) and the value of whatever id attribute we wanted to specifically cite (remy79). We can now tack these on to our original Link after adding the / that separates the Document Identifier and the Target Identifier and a # between the filename and the id (fragment identifier):

If we wanted a Strong Link to that exact bit of text, we might add an md5sum:

Do remember that if you’re creating a Document Identifier from a unique-identifier that has slashes (/), they MUST be escaped:

For a cooler example, I want to cite a bit of a poem without actually having any EPUB version of it. I know the ISTC and the text I like (“The vorpal blade went snicker-snack!”), so I note some text before and after and make a very Loose Link, where neither half is even remotely precise:

Which gives us this Link (fake linebreaks added for readability)


Thanks to Bill Kasdorf, Blaine Cook, Daniel Weck, John Rivlin, Karen Broome, Marc Prud’hommeaux, Marisa DeMeglio, Markus Gylling, Will Manis, and others for sharing their ideas with me on this topic over the recent days and years.


10 Responses to “Developing an EPUB Linking specification”

  1. Daniel Weck

    Keith, this is a great summary !

    As you rightly mention, this linking mechanism would offer scope for innovation in terms of referencing “points” or “ranges” of XHTML documents within EPUB publications. With the introduction of native support for embedded media objects in HTML5 (i.e. video and audio), the linking mechanism described by the W3C MediaFragments specification [1] would come in handy.

    Let’s take the following example:


    In this case a Reading System would open the book at the page where the video element resides, it would “scroll” the video into view, and it would play the specified fragment of the video clip.

    This type of deep linking mechanism is obviously very useful for marking annotations. Let’s extend the above example by adding a capture of a specific graphical region within the video frame:


    Now, for the sake of technical accuracy, it is worth mentioning that these examples are using the MediaFragments URI *syntax*, but are actually based on URI *query* (passing of “GET” parameters, with analogy to the HTTP protocol). This is an important distinction as the “fragment” part of a URI must return the same mime-type as the addressed resource.

    Cheers, Dan


  2. Kevin Hawkins

    I don’t see how Target Identifiers allow for Loose Links. How would a system know to degrade a relative URI into something that would work across any copy of One Thousand and One Nights? That is, how would it be able to find chapter six across editions?

  3. Keith Fahlgren

    @Kevin: My last example with the text quote is the way I would solve that sort of very loose link, but you’re right that we need to explore that more. There are others that have thought about this more than I have, so I’ll let them add their solutions in their own words.

  4. Frank Lowney

    I think that some variation on search is the key to loose links. A cascading search would first seek the title locally and remotely and then drill down from there to numerical chapter, text string and other search terms. This could be expressed in the same way that search args are used on the web. There might even be some user configuration via a preferences file. Goodness of fit might be expressed in an ordered list with samples of nearby text, images, etc.

  5. Thomas Rasche

    For index, dictionary and thesaurus linking, might I suggest a simple solution:
    The idea is this, that similar to mailto:… this will activate the ereading software’s search function (to look for the word ‘WORD’). This means there is no further naming, unique identifying required within the document. All the work is in the ereader’s processing power.
    Indexes become a list of suggested search terms, and are easy to create and programme into the xhtml’s.
    Dictionaries and Thesauraus’ can easily cross-reference other words within the ebook.

  6. Greg Schofield

    I have been working on this for a little while.

    One thing we have to do is make a reference system that works electronically but also in bibliographies. Moreover, it should be compatible with traditional references such as those used for Shakespeare or the Bible.

    I have a system that allows the generation of world-wide unique ids based on a time stamp and the static server IP address.

    This is what I would suggest using non-breaking spaces in the HTML5 id attribute:

    This is a …

    j6qencdqvwfcd ⅰ.2 §3 ❡345

    edition id: j6qencdqvwfcd
    Part: ⅰ
    Chapter: 2
    Section: 3
    Paragraph: 345

    ie in Shakespeare’s Othello: Act Ⅳ Scene 3.23
    id= “epub:a6qencdqvwfcd Act Ⅳ, Scene 3.23”
    title= “23”

    The idea being to use natural language as unique and readable ids to mark up natural reference points, that could be used electronically or in print. The non-breaking space acting as a word separator, brackets, dashes, semi-colons, commas and periods acting as logical combiners.

    Hence an introduction to Othello could be id= “epub:a6qencdqvwfcd (Introduction) ❡14” and a quote section as:
    epub:a6qencdqvwfcd (Introduction) ❡14—45

    I wish you all the luck in getting a sensible system up and going, the reason I like the Id approach is that a fragment says it all, and it can be equally applied top emails or anything that the original source is useful to find.

  7. Ben Trafford

    Out of curiousity, why not allow a link identifier that is an XPath, XPointer, or an XQuery? There’s a whole host of XML-based document querying stuff that should inform this work.

  8. Keith Fahlgren

    @Rado: You’re absolutely about the similarity to DOIs. There are a number of folks inside the EPUB3 Working Group trying to understand how EPUB3 identifiers can be better harmonized with DOIs (and perhaps the IDPF with DOI registration agencies). That said, I’m not convinced that DOIs are a good match for every EPUB3 creator.