Internationalization

Goals

Concepts

Library

Dependencies

Lesson

When studying character sets you have seen that earlier systems such as ASCII assumed that users would be speaking American English. Unfortunately many programs even today seem to assume that all users will be English speakers, that no one will have a name with accent characters, or that everyone writes numbers using the same format. But the world is made up of many families of languages and cultures; even if your software does not initially have translations available for other languages, it should be built so that other translations may be added in the future.

Internationalization is designing or enabling a software product so that it can potentially be used following the conventions of various regions of the world. It is frequently abbreviated as i18n (where 18 represents the number of letters between the first and last letters of the word internationalization. The flip side of the internationalization coin is localization (l10n), which is the process of actually adapting a program to work with a certain language and/or region. For example, even if a program has been internationalized, it may only support English until someone localizes it to Spanish by translating the messages that are presented to the user.

Internationalization is a broad and intricate subject. This lesson concentrates on internationalization of program resources, primarily messages (usually stored as formatted strings) along with other media such as images and audio clips.

Locales

An internationalized program uses a locale to identify a specific language and region for which formatting and other conventions are defined. Java represents a locale using the java.util.Locale class, which is generally composed of up to three components:

language
A lowercase two or three-letter language code such as en for English, ja for Japanese, or kok for Konkani. See Language codes - ISO 639.
region
An uppercase two-letter code indicating the region, such as US for United States or GB for Great Britain. This country designation allows for variations across regions even for the same language, such as spelling differences between the USA and the United Kingdom. See Country Codes - ISO 3166.
variant
One or more additional values indicating some further differentiation of a language within a region, such as polyton for Polytonic Greek. In a large majority of projects the locale variant is not used, although it's good to know such variations can be described if needed.

The design of Java Locale follows what the Internet Engineering Task Force (IETF) defines as a language tag, an identifier string composed of the same language, country and variant described above and separated by a hyphen - character. The language English (en) as spoken in the United State (US) would be represented using language code en-US, for example. If you have a language tag you can create a Locale using Locale.forLanguageTag(String languageTag), or produce a language tag from a Locale instance using Locale.toLanguageTag().

Examples of creating Locale instances for various languages.
final Locale hindi = new Locale("hi");  //(hi) Hindi
final Locale hindiIndia = new Locale("hi", "IN");  //(hi-IN) Hindi as spoken in India
final Locale portuguesePortugal = new Locale("pt", "PT");  //(pt-PT) Portuguese as spoken in Portugal
final Locale brazilianPortuguese = new Locale("pt", "BR");  //(pt-BR) Brazilian Portuguese
final Locale ptBR = Locale.forLanguageTag("pt-BR");  //(pt-BR) Brazilian Portuguese
final Locale newfoundlandEnglish = new Locale("en", "CA", "newfound");  //(en-CA-newfound) Newfoundland English
System.out.println(newfoundlandEnglish.toLanguageTag());  //prints "en-CA-newfound"

You can get a list of all the installed locales on your JVM by calling Locale.getAvailableLocales(), which return an array of Locale instances. The default locale for the entire JVM, which is set to reflect the system settings when the JVM starts, can be retrieved using Locale.getDefault(). Similarly you can override the locale for the entire JVM using Locale.setDefault(Locale newLocale).

The JVM actually keeps track of two locales for different purposes, represented by the Locale.Category enum.

Locale.Category.Display
The locale used for the general user interface.
Locale.Category.Format
The locale used for formatting dates, numbers, and currencies.

When retrieving a locale, you should use Locale.getDefault(Locale.Category category) to retrieve the locale specifically for your purpose. For example, if you wish to format numbers, you should request the locale using Locale.getDefault(Locale.Category.Format). Java also provides a method Locale.setDefault(Locale.Category category, Locale newLocale) for setting the default locale for a specific category. Many Java formatting routines, if no locale is specified, will retrieve the current JVM default locale for the appropriate category and use it automatically.

Properties Files

One of the most common formats for storing string resources for a locale is the Java properties file. Because the properties file format is simply a general list of key/value associations, it is used in many contexts across the Java ecosystem, especially as configuration files. Normally a properties files uses the extension .properties. Every non-blank line is either a key/value association or a comment.

Example properties file example.properties.
#Example properties file
title=Example
message=Hello, World!
ok.label=OK
help.label=Help
teacup=Teacup
devanagari.ma=\u092E

ResourceBundle

An internationalized program will allow a separate set of resources to be maintained for different locales. Java's built-in mechanism for managing and accessing locale-specific resources is the java.util.ResourceBundle. The resource bundle API is little more than that of a map; it allows resources to be retrieved via a key, which is a simple string identifying a resource. Although the resource bundle API in theory provides general resource objects to be retrieved, in practice resource bundles are used primarily to store strings. Although many programs later convert these resources to other types such as boolean or int, the ResourceBundle API makes no provisions for retrieving resource types other than String and Object.

Resource Bundle Storage

ResourceBundle is an abstract class; it relies on its subclasses to define how and where resources are stored, similar to the repository pattern you have already been using. One subclass, java.util.ListResourceBundle, provides a way to implement a custom resource bundle that lazily loads a list of resource, perhaps already stored in memory. But by far the most commonly used resource bundle implementation is java.util.PropertyResourceBundle, which loads string resources from a Java properties file.

The ResourceBundle class provides primarily two things that the Properties class alone does not:

Loading Resource Bundles

The magic of resource bundle loading takes place in the ResourceBundle.getBundle(String baseName) and related methods. ResourceBundle will look for a properties file in a sequence of filenames, using the given base name plus a suffix based upon the current locale, in order from most specific to least specific, ending with the given base name. Thus a call to ResourceBundle.getBundle("example") on a system running in the pt-BR locale will search for the following files in order:

  1. example_pt_BR.properties
  2. example_pt.properties
  3. example.properties
example_pt.properties
#Example Portuguese resources
message=Bom dia, Mundo!
help-label=Ajuda
teacup=chávena

Moreover the returned resource bundle will be configured to resolve to parent resource bundle(s) for more general properties files with the same base name. As an example, consider an application that provides an example_pt.properties file with a localization of application strings in Portuguese.

example_pt_BR.properties
#Example Brazilian Portuguese resources
teacup=xícara

The application could also provide an example_pt_BR.properties file with only those words that differed in Brazilian Portuguese.

This configuration would allow an application to load a resource bundle with a single call to ResourceBundle.getBundle("example"). Resource lookup in the pt-BR locale would transparently resolve to the appropriate properties file as follows:

Accessing Resource Bundles

Once you have access to a resource bundle, you can retrieve resources (usually strings) much as you would from a map.

ResourceBundle.containsKey(String key)
Indicates whether the resource bundle or one of its resolving parents contains a resource value for the given key.
ResourceBundle.getString(String key)
Returns a string resource for the given key. A java.util.MissingResourceException will be thrown if there is no such resource.
ResourceBundle.keySet()
Returns a set of all keys in this resource bundle and its resolving parents.

MessageFormat

You've known how to format message for a while, using String.format(String format, Object... args). Behind the scenes String.format(…) uses java.util.Formatter, which recognizes a pattern syntax consisting of placeholders such as %s to indicate a string and %d to indicate a decimal integer. The formatting logic will replace these placeholders with the given arguments. For example, String.format("Hello, %s. You have %d credits worth %d.", name, balance, value) would produce "Hello, Jane. You now have 5 credits worth 1,234." depending on the values in the given variable arguments.

The Formatter syntax used by String.format(…) is capable and flexible, but somewhat terse and arcane. Java provides another formatter class java.text.MessageFormat which uses a different pattern syntax altogether. MessageFormat patterns use positional parameters, in which an argument index is placed inside left and right curly bracket { } characters U+007B and U+007D. Formatting may be performed using MessageFormat.format(String pattern, Object... arguments). The above example can thus be reproduced using MessageFormat.format("Hello, {0}. You have {1} credits worth {2}.", name, balance, value).

Argument index

The integer placed inside braces such as "{0}" indicates the index of the following argument series to use as a replacement. Indicating argument index in the parameter specifications is especially helpful in localization, because different languages use different word order. Latin-derived languages usually place adjectives after the noun, for example, and Hindi uses post-positions instead of prepositions. Even without changing languages, the argument index allows your resources to be flexible. You may later decide to change "Hello, {0}. You have {1} credits worth {2}." to "You have {2} worth of {1} credits, {0}.", which will still work with the arguments name, balance, and value without any code change, if your resources are stored separate from the code, such as in resource bundle properties files.

Format Type

MessageFormat types and styles.
Format Type Format Style
number integer
currency
percent
date short
medium
long
full
time short
medium
long
full
choice pattern

In addition to argument index, you can provide a format type (such as number) separated from the argument index by the comma , character U+002C. You can further specify a format style (such as integer) for a particular format type by separating it with an additional comma.

MessageFormat pattern string using format types and format styles.
"Hello, {0}. You have {1,number,integer} credits worth {2,number,currency}."

One of the major benefits of specifying a format type and style is that the MessageFormat can provide the correct form of values based upon the locale. The value 1234 for example when when formatted for the en-US locale would appear as $1,234.00, while the same value formatted for the pt-BR locale would appear as R$ 1.234,00.

Choices

You'll note in the pattern string above that the result of formatting "{1,number,integer} credits" makes perfect sense—unless the specific number of credits is the value 1. In this case 1 credits is grammatically incorrect, as the English word credits is plural. To accommodate such variations, MessageFormat provides the choice format type.

If you specify a format type of choice, the format style you provide will be a subformat string specified in the java.text.ChoiceFormat class. This subformat is a series of conditions separated by vertical line | characters U+007C. After each condition a format pattern is provided, following the number sign # character U+0023, for values matching the condition. The example below will produce "no credits", "one credit", or "X credits" based upon whether the number of credits is 0, 1, or greater than 1. Note that the subformat for values greater than zero is identical to the original parameter that ignored singular and plural.

MessageFormat pattern string using the choice format type.
"Hello, {0}. You have {1,choice,0#no credits|1#one credit|1<{1,number,integer} credits} worth {2,number,currency}."

Quotes

If literally curly brackets are to appear within a MessageFormat pattern, they must be surrounded by matching apostrophe ' characters U+0027, sometimes referred to as single quotes. Single characters or multiple characters can be quoted, as in "The set '{'3, 7, 11'}' contains prime numbers." and "The set '{3, 7, 11}' contains prime numbers.". This approach to escaping introduces one of the most significant confusions into MessageFormat patterns: if a literal apostrophe is to appear, it too must be encoded by doubling the single quote!

MessageFormat pattern with quoted apostrophe.
"Gödel''s incompleteness theorems deal with number theory and proof."

Locale

By default MessageFormat uses the default system format for the Locale.Category.Format category. Because MessageFormat extends java.text.Format, it inherits a general Format.format(Object obj) method. If you wish to format a message for a specific locale, you can create an instance of a MessageFormat directly, indicating a pattern and a locale, and then pass an array of arguments to the format(…) method.

The following example illustrates saving a string resource in two properties files. The locale is specified directly, for a user who has requested notifications in Brazilian Portuguese, for example.

Loading and formatting a resource string using a manually selected locale.
#app-resources.properties
credits.message={0} has {1,number,integer} credits.
#app-resources_pt.properties
credits.message={0} tem {1,number,integer} créditos.
/*App.java*/
final String name = …;
final int credits = …;
final Locale locale = new Locale("pt", "BR");
final ResourceBundle resouceBundle = ResourceBundle.getBundle("app-resources");
final String messagePattern = resouceBundle.getString("credits.message");
final MessageFormat messageFormat = new MessageFormat(messagePattern, locale);
final String message = messageFormat.format(new Object[]{credits});
System.out.println(message);

Rincl

Java resource bundles provide an excellent approach to organizing resources for various locales around a single component, such as the button labels for the main screen or text for the About box. But resource bundles make little allowance for object-oriented practices in which a base component may define some resources expected to used in a subclass of that component. Moreover there is no native mechanism for compartmentalizing different locales on a single JVM. Once a resource bundle is acquired, the API of actual resource lookup is rather limited.

The Resource I18n Concern Library (Rincl), an open-source project by GlobalMentor, Inc., is meant to provide a standard API for accessing resources, define a standard mechanism for accessing them, and allow custom resource implementations to be plugged in transparently. The main dependency io.rincl:rincl provides the central API for working with locales and accessing resources. A separate dependency will provide the actual resource storage implementation.

Rincl is especially nimble because it allows different resource configurations on the same JVM. While most applications have only a single user and therefore one current locale, other scenarios such as web servers may be serving various users on the same JVM, each with their own preferred or selected locale. Rincl uses the Concern Separation Aspect Registrar (Csar) library to compartmentalize resource access on the JVM. Your application or library code need simply use the Rincl access methods, and the correct locale and resource implementation will be utilized automatically.

Rincl Locale

By default Rincl will use the default locale of the JVM, which has been set by one of the Locale.setDefault(Locale newLocale) methods as explained above. But as Rincl allows multiple i18n configurations to exist simultaneously on the same JVM, it is preferred to use Rincl's own locale access methods in the io.rincl.Rincl class.

The Rincl.setLocale(Locale.Category category, Locale locale) method sets the current locale for a particular category; Rincl.setLocale(Locale locale) is a convenience method that sets the locale for all categories. You can retrieve the current locale using the Rincl.getLocale(Locale.Category category). In the default case in which only one Rincl configuration exists on the JVM, these methods access to the JVM defaults as you would through the Locale class. The benefit is that if multiple Rincl configurations are present, these methods automatically and transparently access the locale and resources configured for the context of the calling code.

Rincl Resources

The main interface for accessing resources is io.rincl.Resources. Once you have access to a Resources instance, you can retrieve resources of different types. Resources.getInt(String key) will return an int value, for example, and Resources.getString(String key, Object... arguments) will return a string. If no resource can be located for the the requested resource key, a java.util.MissingResourceException will be thrown.

Acquiring a Resoures instance is conceptually a two-step procedure:

  1. Call Rincl.getResourceI18nConcern() to return the io.rincl.ResourceI18nConcern configured for the current context of your code.
  2. Invoke ResourceI18nConcern.getResources(Class<?> contextClass) to return Resources for a certain class using the current locale.

The easy way to accomplish this to simply implement io.rincl.Rincled for every class needing access to resources. The implementing class may then call Rincled.getResources() to retrieve a Resources instance for that class, which will occur behind the scenes using the steps outlined above. The specific Resources acquired and how they are loaded depends on which Rincl implementation has been configured.

Rincl an Application

The io.rincl:rincl-resourcebundle package provides an illustration of how Rincl can add i18n support with resources stored in and loaded from resource bundle properties files. The Rincl resource bundle implementation not only knows how to load from resource bundles, it can also automatically resolve resources in properties file for parent classes and implemented interfaces. This allows you to consolidate related resources based upon inheritance hierarchy.

For example assume that there exists a Restaurant interface for common functionality of a restaurant. The Restaurant interface provides a set of resources that are likely to be used by a restaurant. The interface provides a Portuguese localization as well.

Restaurant.properties
order-ready-message={0}, your order is ready.
Restaurant_pt.properties
order-ready-message={0}, seu pedido está pronto.

The TeaBar class implements Restaurant, so Rincl will give it access to all the Restaurant resources. TeaBar can also provide its own set of resources in addition to those from Restaurant.

TeaBar.properties
teacup-label=Teacup
TeaBar_pt.properties
teacup-label=chávena
TeaBar_pt_BR.properties
teacup-label=xícara

The Teacup class merely needs to implement Rincled to have access to all these resources, as in the example below.

TeaBar.java
public class TeaBar implements Restaurant, Rincled {
  …

  public void printMenu() {
    //Retrieve the label to indicate a teacup based upon the current locale.
    //en-US: "Teacup"
    //pt:    "chávena"
    //pt-BR: "xícara"
    final String teacupLabel = getResources().getString("teacup-label");
    …
  }

  public void notifyOrderReady(@Nonnull final String userName) {
    //Print a message notifying the user that the order is ready.
    //en-US: "Beth, your order is ready."
    //pt:    "Beth, seu pedido está pronto."
    //pt-BR: "Beth, seu pedido está pronto."
    System.out.println(getResources().getString("order-ready-message", userName));
  }

  …

Review

Gotchas

In the Real World

Think About It

Self Evaluation

Task

Internationalize your Booker application using Rincl.

To provide flexibility to the user, and in order to facilitate testing, provide a --locale command-line parameter that allows the user to override the system locale when starting Booker. The locale value must be in the form of an IETF language tag. Use Rincl to set the locale if one is specified. You do not yet need to internationalize the help display.

Option Alias Description
list Lists all available publications.
load-snapshot Loads the snapshot list of publications into the current repository.
--help -h Prints out a help summary of available switches.
--locale -l Indicates the locale to use in the program, overriding the system default. The value is in language tag format.
--name -n Indicates a filter by name for the list command.
--type -t Indicates the type of publication to list, either book or periodical. If not present, all publications will be listed.

See Also

References

Resources

Acknowledgments