Posted on by & filed under ebooks, epub, idpf, linking.

Following my post Developing an EPUB Linking specification a couple of months ago, a subset of the EPUB3 Working Group formed and has been actively researching and discussing the problem of EPUB to EPUB linking and EPUB identifiers in general. Based on feedback from others inside and outside that group, I’ve decided that many aspects of my initial proposal are flawed, but we’re now much closer to consensus on a proposal (or two) for submission to the larger EPUB3 Working Group.

You can see the specifics of the current proposals from the subgroup as they evolve on the EPUB3 wiki at PURL-based EPUB Identifiers and URI Links Proposal. Please comment there or on the epub-revision mailing list rather than here.

The aggressive publication deadlines for EPUB3 (fast approaching in Q1 2011) mean that this work cannot expand too much in scope, but I suspect that’s good motivation: we should try to understand and solve the most pressing problem(s) for EPUB to EPUB links and then get out of the way and watch the implementors & content creators.

Note: These early drafts on the wiki don’t represent the consensus of the EPUB3 Working Group.


4 Responses to “A simpler EPUB Linking proposal”

  1. Jonathan Rochkind

    I am confused about the URI Links Proposal.

    Does it intend to allow any legal URI scheme as as an identifier, or only a certain subset? Or even allowing schemes that aren’t registered?

    It starts out reading like it intends to allow any scheme, explicitly saying it doesn’t need to be a resolvable URL scheme.

    But then it says “Authority: The authority under which the URI has been created. The authority must be DNS registered to insure that URIs are globally unique” But not all URI schemes have such a component — for instance neither “urn:” nor “info:” neccesarily have any “authority” component that is a DNS registered name.

    Then it goes on to use an example of an “epub:” scheme, which I don’t think is a registered URI scheme at all. I think it’s a mistake to encourage the use of an unregistered scheme (ie, not actually a legal URI at all), as something called a “URI”.

    If you wish to allow only certain URI schemes — and further, it looks like, to add additional constraints to those schemes (not ANY http URI is allowed, but only http URIs that follow a certain pattern)… then I suggest that you be more explicit about this.

    List exactly what URI schemes are allowable. http, http://ftp... anything else? NOt info: or urn:, apparently.

    If you wish to then further constrain the forms of these URIs, it would be great to supply an actual pseudo-BNF, although I guess you’ve come close to doing that.

    It’s not clear to me what the benefit of constraining the URI patterns are — why not allow URIs of any form, why require them to be of the form: scheme:://authority/path/unique-identifier{/version}{{/file}#id}

    There is probably a reason known by you guys not by me; a good spec explains the justification for it’s decisions, it would be great if the spec did this.

    If the point is the ability to pull out the /version and /file from the URL… the spec accomplishes that kind of kludgily from my point of view. You have to parse the entire apparent path to see if either of the last two components have the form of a ‘version’ or ‘file’ — if they do not, then the last component is the ‘unique-identifier’ and no optional version or file is present, but if they have the form of a version or file, I guess they should be treated as such? Even without the “file” portion, you still have that problem just with “version” — how do you know for sure that “1.2” isn’t a “unique-identifier”, nothing in the spec seems to say it would be illegal as a unique-identifier, right?

    Can you not tell for sure until you have possession of the epub itself and can look at it’s Manifest to see if either of the last two components match a version or file from the Manifest? This seems fragile.

    I wonder how much you really gain from the version/file thing.In general, that kind of introspection into a URI to pull out particular components seems to be discouraged with URIs. What if you instead just used any old URI for the identifier, but provided a method using HTTP header content negotiation of some kind for an HTTP server to provide alternate versions than the one requested instead? But I assume you thought of that, and there are good reasons for how you’ve done it.

    But I think you need to be clear about what URI schemes are allowed and what URI schemes are not allowed, and be more explicit about the fact that only a certain constrained pattern of even allowed schemes are allowed, not any legal URI under that scheme. And I hope there’s a good answer for how one parses such a URI to know for sure if the last couple components in a path are optional file/version components, or simply the unique-id preceded by a longer ‘path’.

  2. Jonathan Rochkind

    Here is a short stab at an alternate approach that would be more in keeping with how others use URIs, I put it forward not as a serious proposal, but just to brainstorm alternatives.

    1) The identifier in the EPUB OPF should simply _be_ a URI — resolvable URL or not — rather than a component used to build a URI. All identifiers in EPUB OPF become URIs. Non-resolvable URI schemes like “tag:” are explicitly allowed (note that tag: does not follow the pattern in your original proposal).

    2) If you want to link to a certain part of an EPUB, you must use a URI scheme with a ‘fragment’ component, like HTTP. (Or possibly, you just must use an http: URI, whether it resolves or not). The fragment should be of the format #[file_name]/[id]

    That is, put both the file_name and id in the fragment, because they are both neccesary not to uniquely identify the epub, but the find an internal part of the epub once identified and accessible. The point of a fragment in an http uri.

    Still not sure how to handle versions, but I’m wondering if something in HTTP headers, 303 maybe with an Atom response or something, could handle that rather than trying to require introspection into the URI.

  3. Keith Fahlgren

    @Eric: The mailing list is for members of the Working Group. You can find out more about joining the IDPF/Working Group at here.