Context: Help File for a Cross-Platform Application
One of the hobby software projects I’ve worked on over the years is an open source end-user database application called PortaBase. I originally wrote it for the Sharp Zaurus line of Linux-based PDAs but have since ported it to Linux/UNIX, Windows, Mac OS X, and Nokia’s now-abandoned Maemo platform for cell phones and internet tablets (I still use the N900 as my cell phone). PortaBase is a pretty useful little program that I use daily for all sorts of information management tasks, but what I want to talk about this time is the documentation…and specifically, managing translations of it into multiple languages.
The Zaurus had a pretty simple system for application help files: create an HTML file named after the application, put it in the right place during installation, and the user could click a little question mark in the title bar to open that help file in a basic built-in HTML viewer. You could have multiple files linked from the main one, but that was more work to manage and PortaBase was originally simple enough that one long-ish page was good enough. And there was another reason to limit the documentation to a single file: the Zaurus was primarily sold in Japan, and fairly early in development one of the PortaBase users contributed a translation of the help file into Japanese. I posted instructions on how to contribute new translations (of both the UI and the help file), and now there are at least partial translations of PortaBase into ten different languages. At first, having just one HTML file for the help document made it easier for the translators to deal with and for me to keep track of everything.
But there were problems with this solution. As features were added to PortaBase, the help file kept getting longer and it became easy to get lost in it. Some of the translators didn’t really understand file encodings, and sometimes sent me files that had been corrupted over the course of multiple accidental encoding conversions. Some of the translators weren’t very good with HTML, and found the markup a significant barrier to working on the file. And whenever the content of the file changed, it wasn’t easy to keep track of the differences (I sent the translators diffs from the previous version, but then they had to cross-reference that with what they’d already written, and again the diff format was foreign to some of them). Net result, a lot more people translated the user interface text than the help file, because that was in a file format which had dedicated tools that were better suited for managing and updating translations (also, that one massive HTML file looked too intimidating to get started on). About 2 years ago, I decided to completely redesign the help system in order to solve some of these problems.
The core of the redesigned PortaBase help system is Sphinx, a tool written in Python for generating documentation in various output formats from input files written using reStructuredText (reST), a simple but powerful wiki-style syntax. I took the monolithic HTML file and split it up into a separate text file for each section (you can find them here). There’s still some markup syntax that you have to memorize, but it’s pretty intuitive and much easier to read at a glance than HTML.
One of the nice features of Sphinx is that you can generate output in multiple formats: HTML, PDF, EPUB, LaTeX, plain text, etc. For PortaBase I really only needed the HTML output (here’s the English version), but the PDF output also turned out pretty well, and being able to generate an EPUB for loading onto an ebook reader is nice too.
Probably the biggest reason for me switching to Sphinx, though, was that it can automatically generate translation message files from the input files, and then automatically incorporate them when generating the output—in all of the supported formats. It uses the gettext .po format, which is supported by a lot of translation tools and used in much open source and free software. This was a key point; normally splitting one big file into a bunch of little ones would have made it harder to keep track of everything, but now I could use an online system like Transifex to do much of the work for me.
Transifex is an open source Django project for managing translations online, with development funded by charging for hosting of commercial projects (open source projects can get free hosting). It supports a variety of file formats, including both the .po files used by Sphinx and the Qt Linguist files used for the PortaBase user interface. Translations can be done directly in a web browser, eliminating file encoding problems and the need to have translators install custom translation software (for the UI translations). The project page gives a good overview of how complete the different translations are, and you can drill down to get more information.
Additionally, there’s a command line client which makes it easy to grab the latest versions of all the files (or specific ones) and check them into a source control system. This is perhaps the biggest time-saver in the new system for managing the help files. I no longer need to send out a burst of emails with translation and diff files for various languages just before a release, hoping that the translators have time to work on them relatively soon; they can just check the site occasionally and update any files that have been updated since the last time they looked. Also, because the help file was broken down into individual phrases and grouped into separate files, it’s now much less intimidating to get started on and easier to see exactly what changed since the translation was last updated. And even if they don’t finish a translation before a release, I can easily include whatever they’ve managed to get done so far.
You can see the resulting documentation for PortaBase translated into Czech, French, Japanese, and traditional Chinese. I maintain the Japanese translation myself, so I can definitely appreciate the simplified workflow for translators that Transifex provides.
This combination is working pretty well for me, but it does have some problems and limitations of its own:
- While translators don’t need to install software on their computers anymore, developers and Linux distribution maintainers who want to compile and package a full working version of PortaBase have a few more hoops to go through. They need Python, Sphinx, and gettext installed.
- Sphinx makes it pretty easy to generate output in a single language, but doesn’t really help you generate the output in all the supported languages at once. I ended up writing a few scripts to automate this process on various platforms.
- Some locales are identified differently across different platforms (for example, zh_CN and zh_TW versus zh-Hans and zh-Hant). I had to account for that in my scripts also (although this wouldn’t necessarily be a probem if you just wanted to post content on the web, rather than package software for distribution).
- Sphinx conveniently provides translation files for the phrases it automatically generates in the output (stuff like “Search”, “Table of Contents”, etc.), but some of the translations aren’t up to date and some of the phrases are a little…less than obvious. Without looking at the source code and understanding Python, translators get a little baffled when you ask them to translate things like “%s %s documentation” or ” (in ” with no additional context.
- A couple of the PortaBase translators don’t like signing up for accounts on random web services (like Transifex), but it’s still an improvement over the old process for them to be able to download the files directly from an intuitive UI, and then send me the updated files to upload back into Transifex for management.
I do intend to submit code to the Sphinx project to address some of these if somebody else doesn’t beat me to it (which is entirely possible given the number of other things my time gets filled up with).
Sphinx Does a Lot More