Posted on by & filed under epub, tools.

Yes, but it may require a Mac.

The IDPF board met on the last day of the Digital Book 2011 conference at Book Expo America. One of our topics for discussion was what the IDPF as an organization should do to further the adoption of EPUB. I brought up an issue that’s been concerning me for some time: the lack of digital-native authoring tools aimed at authors, not publishers.

If publishers are struggling to produce high-quality EPUB files either via InDesign, XML workflows, or strategic outsourcing, authors are in an even worse place. This is especially true for authors with an ambition to self-publish, or to start a micro-publishing outfit, and yet still retain some creative control over the look of their digital product. InDesign (especially CS5.5) is a great solution for small- to medium-sized publishers who produce both print and digital books, but its feature set is inappropriate for digital-native publishing, and its price and complexity are unsuited for self-publishers.

I’m aware of two document creation tools right now that have native EPUB support and are available for my platform, Mac OS X: Pages, and Scrivener. (The only product I know of on the Windows side is Atlantis. Linux users have to make do with plugins for OpenOffice — judging from the comments in the issue tracker, EPUB export is not a priority, to say the least.)

This post will cover Apple Pages. A subsequent post covers outputting EPUB with Scrivener.

Pages

Apple’s Pages was the first major commercial word processor to include EPUB export. I reviewed the initial EPUB support in August 2010, but it’s been through some updates since then, and I wanted to dive into the semantics of the outputted code more closely.

Screenshot of a sample EPUB file in Apple Pages

The sample document

I started with the Apple-provided EPUB template (more on that later) and added a number of new elements and semantic tests. In particular, I added:

  • Chapters and headings
  • Emphasis and strong text (rendered in Pages as italics and bold)
  • Numbered and unnumbered lists
  • Hyperlinks both internal to the document and external to the web
  • Inline images (by dragging and dropping)
  • A cover page with an image
  • All available metadata in the export pane
  • Tables

In all cases, I used only styles available in the style drawer; I did not change any font sizes or font weight via the toolbar buttons.

The EPUB output

As in my previous test, this produced a valid EPUB 2.0.1 document according to EPUBCheck 1.1. Hooray!

Headers and subheaders

The semantics are much-improved from my first test. Paragraphs are now wrapped in <p> elements, for example, and headers are headers:

  &lt;body&gt;
    &lt;div class=&quot;body&quot; style=&quot;white-space:pre-wrap&quot;&gt;
      &lt;h3&gt;Chapter Two: The Chaptering&lt;/h3&gt;
      &lt;p class=&quot;s2&quot;&gt;This chapter has an introduction. Hello!&lt;/p&gt;
      &lt;h4&gt;I’m a subchapter or section under that. &lt;/h4&gt;
      &lt;p class=&quot;s2&quot;&gt;Don’t hold it against me. I just have a lot of things to say.&lt;/p&gt;
    &lt;/div&gt;
  &lt;/body&gt;

However, the white-space:pre-wrap style is curious: the property is meant to specify that whitespace inside the XHTML is significant, meaning that the ereader/web browser should retain it. That is emphatically not a best practice in general text; on the other hand, there was no whitespace in the output at all, so I’m unsure of its purpose. If I were post-processing this EPUB file, I would remove that style.

I used the “Chapter” style to generate the chapter heading. This header should be an h1 rather than an h3, but at least the subheading is also a header and one step down.

Go boldly

I completely failed to find a way to output strong and em rather than b or i.

Listless

I used the list styles provided in the template, but these are not the lists you’re looking for:

      &lt;h3&gt;Chapter Three: Lists&lt;/h3&gt;
      &lt;p class=&quot;s2&quot;&gt;Reasons why people love lists, in order.&lt;/p&gt;
      &lt;p class=&quot;s2 s3&quot;&gt;&lt;span class=&quot;c2&quot;&gt;1.&lt;/span&gt;Lists are neat.&lt;/p&gt;
      &lt;p class=&quot;s2 s3&quot;&gt;&lt;span class=&quot;c2&quot;&gt;2.&lt;/span&gt;It’s cool to let the computer fill in numbering.&lt;/p&gt;
      &lt;p class=&quot;s2 s3&quot;&gt;&lt;span class=&quot;c2&quot;&gt;3.&lt;/span&gt;Yessir.&lt;/p&gt;
      &lt;p class=&quot;s2&quot;&gt;Other reasons that people like lists, in no particular order:&lt;/p&gt;
      &lt;p class=&quot;s2 s4&quot;&gt;&lt;span class=&quot;c3&quot;&gt;•&lt;/span&gt;Sometimes they have bullets&lt;/p&gt;
      &lt;p class=&quot;s2 s4&quot;&gt;&lt;span class=&quot;c3&quot;&gt;•&lt;/span&gt;Not real bullets.&lt;/p&gt;
      &lt;p class=&quot;s2 s4&quot;&gt;&lt;span class=&quot;c3&quot;&gt;•&lt;/span&gt;Those are scary.&lt;/p&gt;

This must be fixed.

Tables

A little verbose markup-wise, but basically fine:

      &lt;table class=&quot;s5&quot; style=&quot;margin-left:0.0px;width:99.8%;border-collapse:collapse&quot;&gt;
        &lt;col style=&quot;width:33.3%&quot;/&gt;
        &lt;col style=&quot;width:33.3%&quot;/&gt;
        &lt;col style=&quot;width:33.3%&quot;/&gt;
        &lt;tr style=&quot;height:25.0%&quot;&gt;
          &lt;td class=&quot;s8 s6 s7&quot;&gt;
            &lt;h2 class=&quot;s9&quot;&gt;Reasons why tables are nice&lt;/h2&gt;
          &lt;/td&gt;
          &lt;td class=&quot;s8 s6 s7&quot;&gt;
            &lt;h2 class=&quot;s9&quot;&gt;Who feels this way&lt;/h2&gt;
          &lt;/td&gt;
          &lt;td class=&quot;s8 s6 s7&quot;&gt;
            &lt;h2 class=&quot;s9&quot;&gt;I can’t think of a third thing.&lt;/h2&gt;
          &lt;/td&gt;
        &lt;/tr&gt;
        ....

Images, covers, and links

Creating an image is as easy as dragging it in. I’m not sure if it’s possible to add alt text to the image — I believe document creation tools should prompt users to add descriptive text by default.

Page of sample ebook in iBooks showing an image of a dog
      &lt;p class=&quot;s2&quot;&gt;
        &lt;img src=&quot;images/droppedImage.png&quot; alt=&quot;droppedImage.png&quot; style=&quot;&quot;/&gt;
      &lt;/p&gt;

Only images styled as “inline” will be exported; Pages will warn you that the image was discarded if it had a floating or fixed style. I tried to select a page with an inline image as the cover page but Pages gave me a warning that it was being discarded. Then it actually shows up in iBooks anyway.

It would be nice if the original filename were preserved (it was not “droppedImage.png”), and the empty style attribute should be discarded on output.

      &lt;h1&gt;&lt;span id=&quot;chapter-5-sh1&quot;/&gt;Chapter Five: Hyperlinks&lt;/h1&gt;
      &lt;p class=&quot;s2&quot;&gt;This is an internal link to &lt;a href=&quot;chapter-1.xhtml#b1&quot;&gt;&lt;span class=&quot;c1&quot;&gt;chapter one&lt;/span&gt;&lt;/a&gt;. This is an external link to &lt;a href=&quot;http://placekitten.com/&quot;&gt;&lt;span class=&quot;c1&quot;&gt;photos of kittens&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;

The empty span here is for the purpose of creating a back-link. A similar one was auto-added to Chapter One. Adding an internal hyperlink requires an initial step of creating a Pages “bookmark”, and then linking to that bookmark, which was a little confusing; I should be able to target any point in the document using the hyperlink feature.

I didn’t test HTML5 video output, but I’ve been told that video can be successfully embedded and output such that the video will work in iBooks (it will use HTML5 tagging).

Metadata

Both the OPF and the NCX were perfectly well-formed. The EPUB export dialog should optionally request richer metadata than the current list of author/title/subject, though.

The dreaded sample document

The EPUB export function is next to useless on large documents unless you start with the sample template, or import its styles later and tediously update yours to match. The EPUB styles are completely opaque — I have no idea why they have magical properties, or what I would do to my own styles to emulate them. Since the Pages native file format is binary, there’s nothing for me to inspect to reverse-engineer the styling. The Pages file format is zipped XML, so it may be possible to inspect it directly — thanks Steve!

The native header/paragraph/list styles in the blank template should output useful semantics in the XHTML. It is unacceptable to force users to import an external document to produce a half-decent EPUB file. At the very least, an EPUB-friendly template should be one of the default choices available when creating a new document.

Improvements

  • The list styles should generate lists. They should be ordered or unordered as appropriate to the style.
  • EPUBs should be importable as well as exportable. It’s understandable that they won’t magically re-constitute into the original Pages document, but a conversion pipeline is entirely reasonable.
  • It should be possible to export chunked EPUBS (with multiple XHTML chapters) without having to use the sample template.
  • It should be possible for a power user to understand how to create styles that will have specific behaviors.
  • It should be possible to customize the XHTML serialization (“I want the style named ‘strong’ to output strong elements with the classname ‘foo'”).
  • There should be much more metadata allowable in the OPF file.
  • Images should require or at least prompt for alt attribute values.
  • Bold and italic buttons should output strong and em with the appropriate CSS styling in all cases. I would say this is actually true of any EPUB output tools — it’s unreasonable to ask users to create named styles (as in InDesign) when those tempting bold and italic buttons are available.

I don’t expect Windows/Linux versions of Pages to ever exist, which means that Pages will remain a marginal tool in the publishing ecosystem, but it’s perfectly adequate for an individual Mac-only user.

Download the sample EPUB file and sample Pages document, (released under a Creative Commons Attribution license).

Tags:

26 Responses to “Can an author create an EPUB file using normal tools? Part 1: Pages”

  1. Dave Parker

    A quick google found me Sigil http://code.google.com/p/sigil/

    “Sigil is a multi-platform WYSIWYG ebook editor. It is designed to edit books in ePub format”

    I have no idea as to its quality, I’m afraid I don’t know enough about epub.

  2. Liza Daly

    Sigil can definitely be useful for editing existing EPUB files, but it’s not an authoring tool in the sense of a word processor that a novelist or other author would use.

  3. Rob in Denver

    A Windows version of Scrivener is currently in beta.

  4. JiminyPan

    Sigil is a must-go if you want to be serious and provide with “high-quality” ePubs but can’t afford something expensive.

    Let’s be honest, Print and digital “layouts” are completely different and you must (at any cost) avoid word-processors since they were not meant to be e-Pub generators. In other words, you text has to be completely edited with e-book in mind, or else it will be poorly thought-out and won’t offer a satisfactory experience to readers.

    I know that using Sigil means quite a lot of extra-work but Pages’ export is… well… mediocre. Besides, it sometimes generate ePubs that other readers like Sony Pocket PRS or Bookeen can’t read ! The reader crashes cos’ the file is corrupted, though it is ePubCheck -compliant.

    Don’t forget a lot of readers think independents and self-published authors create poor e-books. Authors have to make their utmost to show them wrong and if it means extra-work, then it’s worth it. As far as I’m concerned, Pages’ export is too restrictive to achieve something good. I’ve teamed up with a small independent publisher and we tried a huge amount of different applications. Conclusion is we’re now writing HTML by hand since it’s the most efficient way to produce high-quality e-books. At the end of the day, we actually save a lot of time and we are sure the file is alright by doing so.

    Unfortunately, even Adobe tools can generate utter crap. Take a look at the manifesto of your file once exported with InDesign and you’ll see there’s a lot of room for improvements.

    Sigil is not perfect and you’ll have to dodge/avoid minor bugs but once you know them, it’s OK.

    I’m looking forward for your post on Scrivener, this one was pretty interesting ;)

  5. Steve Dunham

    Actually, pages uses a custom XML file format. In older versions of the software the “.pages” file was actually a directory containing the XML file, attachments, and preview images. In current versions it is a zip file of the same. (If you unzip it into a directory with a “.pages” extension, Pages will still open it.)

    I’ve played around with generating pages files in the past, but didn’t finish the project. (It was easier to target other formats.)

  6. Ben

    I think writers, especially the self published variety, should write their own HTML+CSS if they actually want to create ePubs the right way. It’s really not difficult at all to learn enough basics to make a nice looking ePub that isn’t screwed up on half the readers. The only ‘hard’ part of ePub is making the manifest and other structural files.

  7. Robert Nagle

    A stupid question: why hasn’t Microsoft made a plugin for epub export? Even if the code is messy and loses some formatting, it doesn’t sound like that huge a task to do.

  8. ficklepixel

    Great analysis, Liza! Just a couple of points from my experience:

    EPUB can be exported from any Pages document – not just the EPUB template.

    Also, chapters can be added to an existing Pages document by either:
    1. Use a named style that would appear in the Pages’ TOC, eg. Heading 1. The TOC is located in the Inspector under Documents.
    2. Use section breaks where you want to break the content into chapters.

    Hope that helps!

  9. Rob Oakes

    One potential solution for authors looking to target ePub might be to use LyX. While we don’t have full ePub support yet (it’s currently planned for the 2.1 release), we do have xhtml and CSS support. (As of the LyX 2.0 release about a month ago.)

    As one of the LyX developers, I would be interested in your take as to what features are highest priority for users and the degree of fine control that should be available to authors (as compared to designers).

  10. bowerbird

    liza said:
    > I brought up an issue that’s been concerning me for some time:

    glad to hear it. it’s been an issue concerning me for about 8 years.

    at any rate, it’s ironic that you posted this today, because i will be
    releasing my converter-program later in the day. from a simple
    plain-text file using a form of (very!) light markup, it will create
    e-books output in .pdf, .html, .epub, and .mobi. free. cross-plat.

    i call the program “jaguar”…

    > http://jaguarps.com

    -bowerbird

  11. Liza Daly

    bowerbird: Good to hear it. I use asciidoc for that purpose and find that approach very handy.

    ficklepixel: Thank you! It was not obvious how to make that work from the EPUB documentation provided by Apple. The critical issue is being able to produce multiple XHTML chapters.

  12. William

    Hi Liz,

    Thanks for the update/review on Pages. I am a personal user who would like format some text into ePUB for my own use and I have tried doing it on my friend’s macbook with inDesign (forgot what version) and it drove me nuts after over 20 hours trying to product something useful without it dropping image or formatting (not to mention initial time just to know how to use inDesign)

    Your concern is very valid and I agree it’s quite a big problem.

    However, I just started to use Pages after noticing it has an improved ePub export and here are some the little quirks I found after about 10 hours trying to do what I want.

    1. Chapter Name and Chapter Name alone is the only thing that will create a break in the xhtml. And you are right, it is magical.
    – I’ve tried rename “Chapter Name” style to “Chapter Name1” and it still works
    – tried duplicate the style by using “create new style from selected”, no matter what I name it, it won’t create a new xhtml. I’ve also tried rename original “Chapter Name” to another name and use it for my new style. it doesn’t work

    2. Each “Chapter Name” created xhtml will only hold 7 images. I noticed this after my exported ePub would just drop images. After many hours of testing/exporting, I finally figured out it’s the magical number 7 that cause the image to drop. No matter the resolution, size, position of the image, each “Chapter” will only hold 7 images. Anything beyond will be dropped and no warning is given.

    3. When using the exported image in iTunes, the Cover Image/First Page will not show up on the left side the chapter index. This image appears when you have an ePub open, go to chapter indexs and place the iPad horizontally. I have left some note on one your previous post last year. It’s basically because iBook uses a non-standard tag to look for the cover for that specific location. The only way to have this appear after export is to have checked “Use first page as cover” during export. HOWEVER, if you have this checked, it will DELETE your first page after the export process, leaving your ePub with no cover when you first open iBooks. The only work around i found so far is to have 2 cover page, where the first cover page will be the sacrificial lamb for the export process, leaving the second cover page available in the ePub.

    Hopefully this information will be helpful for other people using Pages to create ePub.

    Although Pages are definitely still buggy for use with ePub publishing, I still think it’s more suitable for non-professionals to create ePub compare to inDesign, which require way too much investment in time and money to get started.

    P.S. I also use Sigil (which is a bit too simple) and custom edit ePub package (which is tedious).

  13. William

    @ficklepixel

    Questions.

    When you say you can create a chapter, you mean a chapter in Pages, or a seperate xhtml file in the exported epub file?

    I think i’ve tried adding new style into TOC and section break and neither of them creates a break/new xhtml file.

    @Apple
    There are many reasons why people want to have control on where new xhtml file appears, not just because it’s the beginning of a chapter… fix it fix it fix it fix it. :)

  14. Liza Daly

    William, thank you for the very useful findings. I’m primarily concerned with creating new XHTML files rather than inserting page breaks. I’ve been told that inserting a “Section Break” will generate a new XHTML file.

  15. William

    Liza,

    thanks for the reply. Since everyone seems to confirm section break creates a new XHTML file, I am going to try it again tonight when I have time. ..

  16. Frank Lowney

    Although you can open the Pages.app and look through the Contents, you won’t find an easy explanation of how the ePub export works because that info is in the Objective C binary called “Pages” in Pages > Contents > MacOS >

    The other observation I’d like to make is that the iWork ’09 suite of which Pages is a part is long overdue for an upgrade. I would not be surprised to see that happen shortly before or after the official release of the EPUB 3 standard.

  17. Ryan Collins

    Has anyone looked at writing in a regular text editor (like Writeroom on the Mac or Darkroom) using Markdown?

    I’ve played around with it, and coupled with Calibre allows me to automate the entire production of an eBook in any format that Calibre supports. Inserting images is a little tricky, but other than that it works pretty well. It’s simple and very straightforward.

  18. Moriah Jovan

    Well, being a self-pubbed author who also creates good EPUB files, I will say this: There is absolutely no reason an author couldn’t write directly in Sigil. It has a WYSIWYG interface and, since most authors don’t seem to know their word processor is anything more than a glorified typewriter, they wouldn’t be losing any functionality.

    However, EPUB is far from their first concern. Kindle is their first, if not only, concern. Print is their second. Most of them don’t know what EPUB is for or why they should want it.

  19. bowerbird

    ryan said:
    > Has anyone looked at writing in a regular text editor
    > (like Writeroom on the Mac or Darkroom) using Markdown?

    that would work just fine. and light-markup rocks hard.
    plus then “pandoc” would do the heavy-lifting after that.
    (plus liza and keith have reported that they use asciidoc.)

    or…

    i’ve just released the beta of my e-book converter-tool:
    > http://jaguarps.com

    so you might wanna take a look at that, for the future.

    the converter creates .pdf, .html, .epub, and .mobi,
    all from a single plain-text file that’s easy to author.

    it uses a light-markup format i invented myself, which is
    far more “light” than markdown, or asciidoc, or pandoc.
    you might think there’s no markup in it at all. it’s “zen”.

    the converter is cross-platform. and doesn’t cost a cent.

    and yes, one of the next things on my plate is to release
    a barebones text editor that feeds the converter routines.

    -bowerbird

  20. David

    Bowerbird, this is a solid contribution to the digital publishing space. Have you considered releasing the actual source code for the Jaguar app? I’d be particularly interested in seeing the OSX code, and potentially contributing to the development.

  21. bowerbird

    david-

    sorry, i don’t monitor this blog much…

    the code for jaguar is not open-source.
    while i am not saying it never will be,
    for the time being, no, it’s not available.

    but i’m curious as to what you’d contribute.

    i’m in the process of polishing code that
    incorporates a text-editor, to tighten the
    feedback loop between edits and output.

    (all this code, and much more, has been
    proof-of-concept for many years already.)

    more to the point, parsing z.m.l. is easy.
    (it was built with that objective in mind.)

    so if you want to code for z.m.l., just do it.

    -bowerbird

    p.s. you will also find hot recent action in
    the _markdown_ space, with the release
    of a mac program called “markedapp”,
    which does on-the-fly .html conversion
    using the file output by any text-editor.
    from the .html, .epub is just a skip away.

  22. David

    @bowerbird, thanks for getting back to me on this.

    * Re: Jaguar code & contributions: there’s quite a few motives behind asking if it’s open-source, but the short version:

    1. It needs (and deserves) a truly elegant user experience atop what’s obviously a serious engine/codebase doing complex operations behind the scenes. I’d like to be part of that.

    2. All of the conversion/processing/output operations are critically important to creating a true, end-to-end, single-source publishing workflow & tool. I’m a good enough of a developer to know what sort of code you’ve had to write for Jaguar to work as well as it does. I’m also smart enough to realize it’s far faster to leverage and build-upon your effort, rather than naively re-write it all.

    3. It’s my way of opening up a conversation with someone (you) who grasps the same fundamental set of concepts around publishing that I rarely, rarely can find. You have a section in your book/guide around the idea of using one master/single content-source, and why it matters. You also share my obsession with light-markup, almost to a principles, rebellious extent, considering you use it everywhere (eg: blog comments), knowing it won’t be parsed out.

    * Re: Marked: much appreciate the Marked app tip. I hadn’t seen it yet – just got it. It’s funny… “iA Writer” is my most recent Markdown-friendly editor, but theres a glaring omission of what Marked *only* does (live preview). These guys really need to join forces.

    * Re: ZML: I’d like to talk over this with you, and get a sense of its improvements/extensions to Markdown, etc. I’ve made my way through the entire content/book provided inside Jaguar, which gave me the basics. There’s some strong innovations in there, as well as some syntax/solutions I’d argue could be made even more elegant.

    More importantly than all of that: I’ve stalked out as much of your insights/comments from around the web, since we seem to share a lot of ideas. You come across as someone who needs to unleash the ideas & tools you’ve been actively shaping for a very long time. Things like Jaguar are too progressive and important to remain unknown (if that were to happen). I may have some ways to help you, in a way that jives with an overall publishing venture I’m building. I’m sure that’s more vague than helpful, but we need to talk in more detail.

    Communicating here feels only slightly more efficient than chalk-striking a public mailbox to let each other know theres a document taped underneath.

    Here’s my “email I use when I want to do something as public as post it on a blog comment”: dvd215[at]gmail[dot]com — if you can just ping me with yours, I’ll send you my Skype + phone number, etc and we can talk. Or sometime I’ll just jump on a flight west and buy you a drink after an open mic slam night ;)

    D