The following information is derived from the
dc:language field in the OPF file.
Missing from the chart, of course, is English. It’s so overrepresented it skews the chart to the point of being unreadable.
Of the 62,000 epubs on Bookworm right now:
- 29,642 have no language value
- A little over 20,000 are English (combining various values like “en”, “en-GB”, or — embarrassingly — “American”)
- The remainder, 5,874, are distributed among all other languages
- Almost half of the values are represented just one time (likely bad data)
I found it very interesting that the most represented non-English language code is cs — Czech — by a huge margin. Any ideas why?
Wondering which values are correct? The OPF 2.0 spec is unambiguous:
The content of this element [dc:language] must comply with RFC 3066
(Also, does anyone speak “Robert”?)