APPENDIX A

What is UTF?

UTF is an abbreviation for UCS Transformation Format. UCS is an abbreviation for Universal Character Set. The Universal Character Set is synchronized with the unicode standard. There are three commonly known types of UTF encodings, namely UTF-8, UTF-16 and UTF-32.

The UTF-8 encodes unicode characters into a sequence of 8-bit values known as code units. In UTF-8 the encoding unit is 8-bits long. Similarly UTF-16 and UTF-32 each use 16 and 32 bits for encoding the unicode characters.

There are over a million characters included in the current version of unicode Standard (v5.2.0 is the standard at the time of writing this book). The valid range of code points for the unicode characters is from 0 to 10FFFF (in Hex). Out of ...

Get The class of Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.