One of the great things about sitting on 30,000+ ePub books (as uploaded to Bookworm) is the ability to look at what’s happening in real-world ebook production. Today I’m examining file size, which is useful if you happen to be doing resource planning for a cloud-based ebook reading system.
Smallest 1.6 kilobytes
Largest 233 megabytes
Total # 35,854
Total size 20 gigabytes
I did a frequency analysis of all the individual sizes across the entire corpus:
And then zoomed in on that huge spike in the middle range around 1M and 5M:
So there’s a peak at 3M but really anywhere between 1M and 4M is about average.
I did this analysis earlier in the year when Bookworm had only a paltry 7,800 books, and while the 3M median held, I can say that ebooks in general have gotten larger on average. I attribute that to an increased number of commercially-produced books which include images.