Posted on by & filed under epub, epubcheck, ibooks, tools.

Apple’s iBookstore requirement that all files in the .epub must be listed in the OPF manifest has caused a lot of headaches for content producers. There’s no such requirement in the actual EPUB 2.0 specification, and therefore no test for this condition has been in epubcheck.

However, the vast majority of cases where this occurs are unwanted accidents: publishers including backup files, unused images, or operating system cruft like “.DS_Store.” It’s surprisingly easy to generate very large .epub files due to multiple backups, so while it’s not a formal requirement of the spec, it’s a reasonable best practice.

We consulted with other EPUB developers and have contributed a patch to epubcheck to flag files that are missing from the manifest as a warning. There are three exceptions to this rule:

  • Required files that aren’t meant to be in the manifest, like mimetype.
  • Any files listed in META-INF/container.xml, including the OPF file itself.
  • Any files found in the META-INF directory.

All other unmanifested files will generate a warning:

WARNING: ./general/Unmanifested.epub: item (Unmanifested.txt) exists in the zip file, but is not declared in the OPF file

More details available in the original issue.

We’d like the community to review the patch to ensure that we’ve caught all the necessary cases. If you find a problem, report it as a formal bug, not as a comment here (or do both!). Please include a sample EPUB file as an attachment in the issue so we can generate a test case to verify the fix.

You can download the latest code (revision 135+) either from Google Code as source, or just download epubcheck-1.0.6-dev.jar.

Because this is a developer preview, our online EPUB validator does not use this version. If you are not familiar with how to run epubcheck from the command-line, please continue to use the online version.

Do not use this build for production systems.

If no problems are found we propose that the formal revision number become epubcheck 1.1 to reflect the increase in functionality. We plan to continue work on additional outstanding epubcheck issues as well.

Tags:

4 Responses to “Epubcheck developer build with check for unmanifested files: available for testing”

  1. elmimmo

    But what would you do then about the fallback for non OPS Core Media Type files that the spec mandates to be specified in the manifest, then?

    Some files that we were not including in the manifest are files loaded by an embedded flash movie (i.e. not the SWF itself, which we did include in the manifest, but resource files that the SWF file loads when started). It is pointless IMHO to specify a fallback for those files, since they are not used by anything but the SWF (which does have a fallback specified both in the manifest and inside the object element).

  2. Liza Daly

    That’s a good point, but the OPF specification does say:

    The required manifest must provide a list of all the files that are part of the publication (e.g. Content Documents, style sheets, image files, any embedded font files, any included schemas).

    So while I don’t think the spirit of your approach is wrong, it’s still true that if those files are considered part of the publication, they are meant to be listed in the manifest and thus subject to the fallback requirements.

    However, we’ve implemented this check as a warning, not an error.

  3. elmimmo

    My issue is not so much with including them in the manifest, but if specifying a fallback is mandatory for every file that is not an OPS Core Media Type (or did I get that wrong?), what fallback should I specify when there can be none?

    For example, say a SWF loads an XML file that contains a list of additional files to be loaded. The ePub must have a way to show alternative content to the SWF, that I do understand. But what fallback can be specified for that XML file? Or should we just go and add humbug for the sake of validating?

  4. Liza Daly

    Your understanding is correct; it’s really just a flaw in the specification. If you’re trying to stay valid you’re better off with your current approach of not listing those files in the manifest, as if you do you’ll get flagged as invalid for not having fallbacks.

    Alternatively, you can put those ancillary files in META-INF and our validation routine will ignore them, but as they’re part of the publication that’s also not ideal.