Unihan.txt

Finally, there's the Unihan.txt file. One of the most important scripts in Unicode is the Chinese characters (also called the Han characters or CJK Ideographs). Unicode 3.1 includes more than 70,000 Han characters, and these characters have additional properties beyond those assigned to the other Unicode characters.

Chief among these properties are mappings to various source standards. Unicode defines the meanings of the various Han characters by specifying exactly where they came from. This approach also lets you see just which characters from which source standards were unified together in Unicode. All of these mappings, plus a lot of other useful data, are found in Unihan.txt.

For each Han character, the Unihan.txt file gives at ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.