Support for UTF-8

Unicode Transformation Format-8 (UTF-8) is a character set that encapsulates all Unicode characters using one to four 8-bit bytes. It is the byte-oriented encoded form of Unicode. UTF-8 is and has been the predominant character set for encoding web pages since 2009. Here are some characteristics of UTF-8:

  • Can encode all 1,112,064 Unicode code points
  • Uses one to four 8-bit bytes
  • Accounts for nearly 90% of all web pages
  • Is backward compatible with ASCII
  • Is reversible

The pervasive use of UTF-8 underscores the importance of ensuring the Java platform fully supports UTF-8. This mindset led to the Java Enhancement Proposal 226, UTF-8 property resource bundles. With Java 9 applications, we have the ability to specify property ...

Get Java 9: Building Robust Modular Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.