Internationalization

In order to deliver on the promise “write once, run anywhere,” the engineers at Java designed the famous Java Virtual Machine. True, your program will run anywhere there is a JVM, but what about users in other countries? Will they have to know English to use your application? Java 1.1 answers that question with a resounding “no,” backed up by various classes that are designed to make it easy for you to write a “global” application. In this section, we’ll talk about the concepts of internationalization and the classes that support them.

The java.util.Locale Class

Internationalization programming revolves around the Locale class. The class itself is very simple; it encapsulates a country code, a language code, and a rarely used variant code. Commonly used languages and countries are defined as constants in the Locale class. (It’s ironic that these names are all in English.) You can retrieve the codes or readable names, as follows:

Locale l = Locale.ITALIAN; 
System.out.println(l.getCountry( ));            // IT 
System.out.println(l.getDisplayCountry( ));     // Italy 
System.out.println(l.getLanguage( ));           // it 
System.out.println(l.getDisplayLanguage( ));    // Italian

The country codes comply with ISO3166. A complete list of country codes is at http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html. The language codes comply with ISO639. A complete list of language codes is at http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt. There is no official set of variant codes; they are designated as vendor-specific or platform-specific.

Various classes throughout the Java API use a Locale to decide how to represent themselves. We have already seen how the DateFormat class uses Locales to determine how to format and parse strings.

Resource Bundles

If you’re writing an internationalized program, you want all the text that is displayed by your application to be in the correct language. Given what you have just learned about Locale, you could print out different messages by testing the Locale. This gets cumbersome quickly, however, because the messages for all Locales are embedded in your source code. ResourceBundle and its subclasses offer a cleaner, more flexible solution.

A ResourceBundle is a collection of objects that your application can access by name, much like a Hashtable with String keys. The same ResourceBundle may be defined for many different Locales. To get a particular ResourceBundle, call the factory method ResourceBundle.getBundle( ), which accepts the name of a ResourceBundle and a Locale. The following example gets the ResourceBundle named “Message” for two Locales; from each bundle, it retrieves the message whose key is “HelloMessage” and prints the message.

//file: Hello.java
import java.util.*; 

public class Hello { 
  public static void main(String[] args) { 
    ResourceBundle bun; 
    bun = ResourceBundle.getBundle("Message", Locale.ITALY); 
    System.out.println(bun.getString("HelloMessage")); 
    bun = ResourceBundle.getBundle("Message", Locale.US); 
    System.out.println(bun.getString("HelloMessage")); 
  } 
}

The getBundle( ) method throws the runtime exception MissingResourceException if an appropriate ResourceBundle cannot be located.

Locales are defined in three ways. They can be standalone classes, in which case they will be either subclasses of ListResourceBundle or direct subclasses of ResourceBundle. They can also be defined by a property file, in which case they will be represented at runtime by a PropertyResourceBundle object. ResourceBundle.getBundle( ) returns either a matching class or an instance of PropertyResourceBundle corresponding to a matching property file. The algorithm used by getBundle( ) is based on appending the country and language codes of the requested Locale to the name of the resource. Specifically, it searches for resources in this order:

name_language_country_variant 
name_language_country 
name_language 
name 
name_default-language_default-country_default-variant 
name_default-language_default-country 
name_default-language

In this example, when we try to get the ResourceBundle named Message, specific to Locale.ITALY, it searches for the following names (no variant codes are in the Locales we are using):

Message_it_IT 
Message_it 
Message 
Message_en_US 
Message_en

Let’s define the Message_it_IT ResourceBundle now, using a subclass of ListResourceBundle:

import java.util.*; 

public class Message_it_IT extends ListResourceBundle { 
  public Object[][] getContents( ) { 
    return contents; 
  } 
   
  static final Object[][] contents = { 
    {"HelloMessage", "Buon giorno, world!"}, 
    {"OtherMessage", "Ciao."}, 
  }; 
}

ListResourceBundle makes it easy to define a ResourceBundle class; all we have to do is override the getContents( ) method. This method simply returns a two-dimensional array containing the names and values of its resources. In this example, contents[1][0] is the second key (OtherMessage), and contents [1][1] is the corresponding message (Ciao.).

Now let’s define a ResourceBundle for Locale.US. This time, we’ll make a property file. Save the following data in a file called Message_en_US.properties:

HelloMessage=Hello, world! 
OtherMessage=Bye.

So what happens if somebody runs your program in Locale.FRANCE, and no ResourceBundle is defined for that Locale? To avoid a runtime MissingResourceException, it’s a good idea to define a default ResourceBundle. So in our example, you could change the name of the property file to Message.properties. That way, if a language- or country-specific ResourceBundle cannot be found, your application can still run.

The java.text Class

The java.text package includes, among other things, a set of classes designed for generating and parsing string representations of objects. We have already seen one of these classes, DateFormat. In this section we’ll talk about the other format classes: NumberFormat, ChoiceFormat, and MessageFormat.

The NumberFormat class can be used to format and parse currency, percents, or plain old numbers. Like DateFormat, NumberFormat is an abstract class. However, it has several useful factory methods. For example, to generate currency strings, use getCurrencyInstance( ) :

double salary = 1234.56; 
String here =     // $1,234.56 
    NumberFormat.getCurrencyInstance( ).format(salary);  
String italy =    // L 1.234,56  
    NumberFormat.getCurrencyInstance(Locale.ITALY).format(salary);

The first statement generates an American salary, with a dollar sign, a comma to separate thousands, and a period as a decimal point. The second statement presents the same string in Italian, with a lire sign, a period to separate thousands, and a comma as a decimal point. Remember that NumberFormat worries about format only; it doesn’t attempt to do currency conversion. (Among other things, that would require access to a dynamically updated table and exchange rates—a good opportunity for a Java Bean but too much to ask of a simple formatter.)

Likewise, getPercentInstance( ) returns a formatter you can use for generating and parsing percents. If you do not specify a Locale when calling a getInstance( ) method, the default Locale is used:

int progress = 44;
NumberFormat pf = NumberFormat.getPercentInstance( ); 
System.out.println(pf.format(progress));      // "44%" 
try { 
    System.out.println(pf.parse("77.2%"));    // "0.772" 
} 
catch (ParseException e) {}

And if you just want to generate and parse plain old numbers, use a NumberFormat returned by getInstance() or its equivalent, getNumberInstance( ) :

NumberFormat guiseppe = NumberFormat.getInstance(Locale.ITALY); 

// defaults to Locale.US
NumberFormat joe = NumberFormat.getInstance( ); 

try { 
  double theValue = guiseppe.parse("34.663,252").doubleValue( ); 
  System.out.println(joe.format(theValue));  // "34,663.252"
} 
catch (ParseException e) {}

We use guiseppe to parse a number in Italian format (periods separate thousands, comma is the decimal point). The return type of parse( ) is Number, so we use the doubleValue( ) method to retrieve the value of the Number as a double. Then we use joe to format the number correctly for the default (U.S.) locale.

Here’s a list of the factory methods for text formatters in the java.text package:

DateFormat.getDateInstance( ) 
DateFormat.getDateInstance(int style) 
DateFormat.getDateInstance(int style, Locale aLocale) 
DateFormat.getDateTimeInstance( ) 
DateFormat.getDateTimeInstance(int dateStyle, int timeStyle) 
DateFormat.getDateTimeInstance(int dateStyle, int timeStyle, Locale aLocale) 
DateFormat.getInstance( ) 
DateFormat.getTimeInstance( ) 
DateFormat.getTimeInstance(int style) 
DateFormat.getTimeInstance(int style, Locale aLocale) 
NumberFormat.getCurrencyInstance( ) 
NumberFormat.getCurrencyInstance(Locale inLocale) 
NumberFormat.getInstance( ) 
NumberFormat.getInstance(Locale inLocale) 
NumberFormat.getNumberInstance( ) 
NumberFormat.getNumberInstance(Locale inLocale) 
NumberFormat.getPercentInstance( ) 
NumberFormat.getPercentInstance(Locale inLocale)

Thus far we’ve seen how to format dates and numbers as text. Now we’ll take a look at a class, ChoiceFormat, that maps numerical ranges to text. ChoiceFormat is constructed by specifying the numerical ranges and the strings that correspond to them. One constructor accepts an array of doubles and an array of Strings, where each string corresponds to the range running from the matching number up through (but not including) the next number:

double[] limits = {0, 20, 40}; 
String[] labels = {"young", "less young", "old"}; 
ChoiceFormat cf = new ChoiceFormat(limits, labels); 
System.out.println(cf.format(12)); // young 
System.out.println(cf.format(26)); // less young

You can specify both the limits and the labels using a special string in an alternative ChoiceFormat constructor:

ChoiceFormat cf = new ChoiceFormat("0#young|20#less young|40#old"); 
System.out.println(cf.format(40)); // old 
System.out.println(cf.format(50)); // old

The limit and value pairs are separated by vertical bar (|) characters; the number sign (#) separates each limit from its corresponding value.

To complete our discussion of the formatting classes, we’ll take a look at another class, MessageFormat, that helps you construct human-readable messages. To construct a MessageFormat, pass it a pattern string. A pattern string is a lot like the string you feed to printf( ) in C, although the syntax is different. Arguments are delineated by curly brackets and may include information about how they should be formatted. Each argument consists of a number, an optional type, and an optional style. These are summarized in Table 9.10.

Table 9-10. MessageFormat Arguments

Type

Styles

choice

pattern

date

short, medium, long, full, pattern

number

integer, percent, currency, pattern

time

short, medium, long, full, pattern

Let’s use an example to clarify all of this:

MessageFormat mf = new MessageFormat("You have {0} messages."); 
Object[] arguments = {"no"}; 
System.out.println(mf.format(arguments)); // "You have no messages."

We start by constructing a MessageFormat object; the argument to the constructor is the pattern on which messages will be based. The special incantation {0} means “in this position, substitute element 0 from the array passed as an argument to the format( ) method.” Thus, we construct a MessageFormat object. When we generate a message, by calling format( ), we pass in values to replace the placeholders ({0}, {1}, . . . ) in the template. In this case, we pass the array arguments[] to mf.format; this substitutes arguments[0], yielding the result You have no messages.

Let’s try this example again, except we’ll show how to format a number and a date instead of a string argument:

MessageFormat mf = new MessageFormat( 
    "You have {0, number, integer} messages on {1, date, long}.");
Object[] arguments = {new Integer(93), new Date( )}; 

// "You have 93 messages on April 10, 1999."
System.out.println(mf.format(arguments));

In this example, we need to fill in two spaces in the template, and therefore we need two elements in the arguments[] array. Element 0 must be a number and is formatted as an integer. Element 1 must be a Date and will be printed in the long format. When we call format( ), the arguments[] array supplies these two values.

This is still sloppy. What if there is only one message? To make this grammatically correct, we can embed a ChoiceFormat-style pattern string in our MessageFormat pattern string:

MessageFormat mf = new MessageFormat( 
  "You have {0, number, integer} message{0, choice, 0#s|1#|2#s}.");
Object[] arguments = {new Integer(1)}; 

// "You have 1 message."
System.out.println(mf.format(arguments));

In this case, we use element 0 of arguments[] twice: once to supply the number of messages, and once to provide input to the ChoiceFormat pattern. The pattern says to add an s if argument 0 has the value zero or is two or more.

Finally, a few words on how to be clever. If you want to write international programs, you can use resource bundles to supply the strings for your MessageFormat objects. This way, you can automatically format messages that are in the appropriate language with dates and other language-dependent fields handled appropriately.

In this context, it’s helpful to realize that messages don’t need to read elements from the array in order. In English, you would say “Disk C has 123 files”; in some other language, you might say “123 files are on Disk C.” You could implement both messages with the same set of arguments:

MessageFormat m1 = new MessageFormat( 
    "Disk {0} has {1, number, integer} files."); 
MessageFormat m2 = new MessageFormat( 
    "{1, number, integer} files are on disk {0}."); 
Object[] arguments = {"C", new Integer(123)};

In real life, the code could be even more compact; you’d only use a single MessageFormat object, initialized with a string taken from a resource bundle.

Get Learning Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.