Posted on by & filed under book design, css, design, digitization, ebooks, html5, publishing, python, Tech.

Main cover


This is the second year of Darius Kazemi’s NaNoGenMo project: write code that generates a “novel.” NaNoGenMo is, obviously, a playful turn on National Novel Writing Month (NaNoWriMo) — as is Safari’s blog-post-a-day-in-November.

The “novel” is defined however you want. It could be 50,000 repetitions of the word “meow”. It could literally grab a random novel from Project Gutenberg. It doesn’t matter, as long as it’s 50k+ words.

Submissions are posted on the GenMo Github repo as Issues. Completed works get a fancy green label. It’s only November 8th, but so far, they are awesome:


This is a port of a 1978 BASIC program typed out of a magazine.



50,000 Meows

Someone had to do it: “ replaces all words with a meow of the same length, keeping punctuation.” The first line of Moby Dick:

Meow me Meeeeow. Meow meeow mew–meoow meow mew meow meeeeooow–meeeow


For fans of recursion: “It starts as a simple 8-word sentence, but the program randomly chooses words to define for the reader, and keeps defining words until the book is at least 50,000 words long.”

The transorbital (If you are unfamiliar with the word ‘transorbital’, its definition is “Crossing through (If you are unfamiliar with the word ‘through’, its definition is “In one side and out (If you are unfamiliar with the word ‘out’, its definition is “In a direction away (If you are unfamiliar with the word ‘away’, its definition is “From a particular thing (If you are unfamiliar with the word ‘thing’, its definition is…


Here’s mine.

Seraph screenshot


I based it on the Voynich Manuscript, an untranslated (and probably untranslatable) codex written around the 16th century. Researching this was the best part; almost every Google search for Voynich finds something about extraterrestrials by page two.


There are some standardized transliterations for the manuscript, so I was able to find one, throw out all the careful notations, and end up with a simple list of source words.

The main program slurps up the words, randomizes them, and lays them out in a series of canned templates using Jinja2. I set them in a public domain “Voynich” font. The manuscript uses drop-caps, so I did the same, but I couldn’t yet use the CSS3 initial-letter property, so I ended up writing some CSS that deserves a place on Dave Cramer’s hall of drop-caps shame.


The original manuscript is heavily illustrated with fantastic sketches of fictional plants, nonexistent cosmological bodies, and a healthy number of naked ladies. The illustrations are grouped thematically. I selected keywords like “botany” and “alchemy” that, insofar as the original makes any sense, correspond to those themes.

I used the Flickr API to access the Internet Archive’s 14 million image collection. Each image is tagged with its original century, which meant that I could select “period” illustrations with any given keyword search. Trial and error landed on the 18th century being the “best” from a purely aesthetic point of view.

Example of flowed text in original Voynich manuscript

As with Voynich, each page has only one image on it; the dimensions and size influence which template is chosen. In the hand-written original, the text flows tightly around the illustrations. It’s possible to do this kind of layout in CSS, using the controversial CSS Regions specification, but that wasn’t available in my chosen output pipeline, so I went with standard floats.


I was determined to produce a printable PDF because I don’t get to play with print layout in my day job. While I’ve worked with XML-FO and LaTeX in my life, no one was paying me to suffer, so I used HTML5/CSS3 for layout. There are really no viable open-source implementations of CSS3 paged-media, so I had a choice between Antenna House and Prince XML. Antenna House is unarguably more powerful (it’s what our friends at O’Reilly use to produce their print books), but it has no free license of any kind, so I chose Prince. Nellie McKessan’s article on A List Apart is still the most accessible reference for producing print-ready HTML files — thanks Nellie!

The pages looked best with big lettering, so the final book ends up being 400 pages (a “normal” novel of 50,000 words would be half that). I also wanted really high resolution images for print output. As a result, the generated PDFs are more than a gigabyte. We all suffer for our art. The last step — getting it printed — is still TODO, but I have 22 more days to figure out trim size, PDF/X versus PDF/A, and paper stock.


Each subsection generates a full-bleed cover with a random Voynich word overlaid as the title. These end up being some of my favorite pages.

Cover Cover Cover Cover


Paged media CSS lets me do alternating margin size spreads. I picked a typical “art book” page ratio.

Spread sample Spread sample

Sample pages

Sample page Sample page Sample page Sample page Spiral Sample page Sample page Sample page Sample page

Source code

The Python code driving all the above is available on my Github repo. There’s still plenty of time to join in; lots of great ideas at the main site.




Tags: concrete poetry, ebooks, nanogenmo, pdf, procedural writing, public domain, wtf,

2 Responses to “NaNoGenMo 2014: A procedurally generated mysterious codex”

  1. Brett Kromkamp


    I really enjoyed reading about this project. I’m working on something similar in the sense that I am generating PDF-content from HTML (content for educational purposes) and you pointing me in the direction of Prince XML has proven to be a godsend. Thanks for that. Also, thumbs up for your creativity that you exhibit in this project. Good work!