UnicodeData.txt

The “nerve center” of the Unicode Standard is the UnicodeData.txt file, which contains most of the Unicode Character Database. As the database has grown, and as supplementary information has been added to the database, various pieces of it have been split out into separate files. Nevertheless, the most important parts of the standard continue to reside in UnicodeData.txt.

The designers of Unicode wanted the database to be as simple and universal as possible, so it's maintained as a simple ASCII text file (we'll gloss over the irony of having the Unicode Character Database stored in an ASCII text file). For ease of parsing, this file is a simple semicolon-delimited text file. Each record in the database (i.e., the information pertaining ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.