Sorting Internationalized Strings

One big advantage you get with Strings is that they are built (almost) from the ground up to support internationalization. This means that the Unicode character set is the lingua franca in Java. Unfortunately, because Unicode uses two-byte characters, many string libraries based on one-byte characters that can be ported into Java do not work so well. Most string-search optimizations use tables to assist string searches, but the table size is related to the size of the character set. For example, traditional Boyer-Moore string search takes much memory and a long initialization phase to use with Unicode.

Get Java Performance Tuning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.