Internationalization
Goals
- Learn the motivation for internationalization.
- Understand the distinction between internationalization and localization.
- Explore the use of property files.
- Study Java's built-in resource bundles.
- Discover and use the Rincl library.
Concepts
- Best Current Practice (BCP)
- Concern Separation Aspect Registrar (Csar)
- internationalization (i18n)
- Internet Engineering Task Force (IETF)
- language tag
- localization (l10n)
- locale
- properties file
- resource
- Resource I18n Concern Library (Rincl)
- single quotes
- supplementary character
Library
io.rincl.Resources
io.rincl.Resources.getInt(String key)
io.rincl.Resources.getOptionalDouble(String key)
io.rincl.Resources.getString(String key, Object... arguments)
io.rincl.ResourceI18nConcern
io.rincl.ResourceI18nConcern.getResources(Class<?> contextClass)
io.rincl.Rincl
io.rincl.Rincl.getLocale(Locale.Category category)
io.rincl.Rincl.setLocale(Locale.Category category, Locale locale)
io.rincl.Rincl.setLocale(Locale locale)
io.rincl.Rincl.getResourceI18nConcern()
io.rincl.Rincled
io.rincl.Rincled.getResources()
java.lang.String.format(String format, Object... args)
java.text.ChoiceFormat
java.text.Format
java.text.Format.format(Object obj)
java.text.MessageFormat
java.text.MessageFormat.format(String pattern, Object... arguments)
java.util.Formatter
java.util.ListResourceBundle
java.util.Locale
java.util.Locale.CANADA_FRENCH
java.util.Locale.CHINA
java.util.Locale.ENGLISH
java.util.Locale.UK
java.util.Locale.US
java.util.Locale.forLanguageTag(String languageTag)
java.util.Locale.getAvailableLocales()
java.util.Locale.getDefault()
java.util.Locale.getDefault(Locale.Category category)
java.util.Locale.setDefault(Locale newLocale)
java.util.Locale.setDefault(Locale.Category category, Locale newLocale)
java.util.Locale.toLanguageTag()
java.util.Locale.toString()
java.util.Locale.Category
java.util.MissingResourceException
java.util.Properties
java.util.PropertyResourceBundle
java.util.ResourceBundle
java.util.ResourceBundle.containsKey(String key)
java.util.ResourceBundle.getBundle(String baseName)
java.util.ResourceBundle.getBundle(String baseName, Locale locale)
java.util.ResourceBundle.getBundle(String baseName, Locale locale, ClassLoader loader)
java.util.ResourceBundle.getString(String key)
java.util.ResourceBundle.keySet()
Dependencies
Lesson
When studying character sets you have seen that earlier systems such as ASCII assumed that users would be speaking American English. Unfortunately many programs even today seem to assume that all users will be English speakers, that no one will have a name with accent characters, or that everyone writes numbers using the same format. But the world is made up of many families of languages and cultures; even if your software does not initially have translations available for other languages, it should be built so that other translations may be added in the future.
Internationalization is designing or enabling a software product so that it can potentially be used following the conventions of various regions of the world. It is frequently abbreviated as i18n (where 18 represents the number of letters between the first and last letters of the word internationalization
. The flip side of the internationalization coin is localization (l10n), which is the process of actually adapting a program to work with a certain language and/or region. For example, even if a program has been internationalized, it may only support English until someone localizes it to Spanish by translating the messages that are presented to the user.
Internationalization is a broad and intricate subject. This lesson concentrates on internationalization of program resources, primarily messages (usually stored as formatted strings) along with other media such as images and audio clips.
Locales
An internationalized program uses a locale to identify a specific language and region for which formatting and other conventions are defined. Java represents a locale using the java.util.Locale
class, which is generally composed of up to three components:
- language
- A lowercase two or three-letter language code such as
en
for English,ja
for Japanese, orkok
for Konkani. See Language codes - ISO 639. - region
- An uppercase two-letter code indicating the region, such as
US
for United States orGB
for Great Britain. This country designation allows for variations across regions even for the same language, such as spelling differences between the USA and the United Kingdom. See Country Codes - ISO 3166. - variant
- One or more additional values indicating some further differentiation of a language within a region, such as
polyton
for Polytonic Greek. In a large majority of projects the locale variant is not used, although it's good to know such variations can be described if needed.
The design of Java Locale
follows what the Internet Engineering Task Force (IETF) defines as a language tag, an identifier string composed of the same language, country and variant described above and separated by a hyphen -
character. The language English (en
) as spoken in the United State (US
) would be represented using language code en-US
, for example. If you have a language tag you can create a Locale
using Locale.forLanguageTag(String languageTag)
, or produce a language tag from a Locale
instance using Locale.toLanguageTag()
.
You can get a list of all the installed locales on your JVM by calling Locale.getAvailableLocales()
, which return an array of Locale
instances. The default locale for the entire JVM, which is set to reflect the system settings when the JVM starts, can be retrieved using Locale.getDefault()
. Similarly you can override the locale for the entire JVM using Locale.setDefault(Locale newLocale)
.
The JVM actually keeps track of two locales for different purposes, represented by the Locale.Category
enum.
Locale.Category.Display
- The locale used for the general user interface.
Locale.Category.Format
- The locale used for formatting dates, numbers, and currencies.
When retrieving a locale, you should use Locale.getDefault(Locale.Category category)
to retrieve the locale specifically for your purpose. For example, if you wish to format numbers, you should request the locale using Locale.getDefault(Locale.Category.Format)
. Java also provides a method Locale.setDefault(Locale.Category category, Locale newLocale)
for setting the default locale for a specific category. Many Java formatting routines, if no locale is specified, will retrieve the current JVM default locale for the appropriate category and use it automatically.
Properties Files
One of the most common formats for storing string resources for a locale is the Java properties file. Because the properties file format is simply a general list of key/value associations, it is used in many contexts across the Java ecosystem, especially as configuration files. Normally a properties files uses the extension .properties
. Every non-blank line is either a key/value association or a comment.
- Key/value pairs use the equals
=
signU+003D
(e.g.key=value
) or the colon:
characterU+003A
(e.g.key:value
) as a delimiter. Spaces around the delimiter are ignored. - Comment lines start with the number # sign
U+0023
or the exclamation!
mark characterU+0021
. - Any character not in the ISO-8859-1 charset must be escaped using the form
\uXXXX
with the hex code of the Unicode code point.
ResourceBundle
An internationalized program will allow a separate set of resources to be maintained for different locales. Java's built-in mechanism for managing and accessing locale-specific resources is the java.util.ResourceBundle
. The resource bundle API is little more than that of a map; it allows resources to be retrieved via a key, which is a simple string identifying a resource. Although the resource bundle API in theory provides general resource objects to be retrieved, in practice resource bundles are used primarily to store strings. Although many programs later convert these resources to other types such as boolean
or int
, the ResourceBundle
API makes no provisions for retrieving resource types other than String
and Object
.
Resource Bundle Storage
ResourceBundle
is an abstract class; it relies on its subclasses to define how and where resources are stored, similar to the repository pattern you have already been using. One subclass, java.util.ListResourceBundle
, provides a way to implement a custom resource bundle that lazily loads a list of resource, perhaps already stored in memory. But by far the most commonly used resource bundle implementation is java.util.PropertyResourceBundle
, which loads string resources from a Java properties file.
The ResourceBundle
class provides primarily two things that the Properties
class alone does not:
- Locale-based lookup of property storage when loading.
- Caching of properties after loading for fast lookup.
Loading Resource Bundles
The magic of resource bundle loading takes place in the ResourceBundle.getBundle(String baseName)
and related methods. ResourceBundle
will look for a properties file in a sequence of filenames, using the given base name plus a suffix based upon the current locale, in order from most specific to least specific, ending with the given base name. Thus a call to ResourceBundle.getBundle("example")
on a system running in the pt-BR
locale will search for the following files in order:
example_pt_BR.properties
example_pt.properties
example.properties
Moreover the returned resource bundle will be configured to resolve to parent resource bundle(s) for more general properties files with the same base name. As an example, consider an application that provides an example_pt.properties
file with a localization of application strings in Portuguese.
The application could also provide an example_pt_BR.properties
file with only those words that differed in Brazilian Portuguese.
This configuration would allow an application to load a resource bundle with a single call to ResourceBundle.getBundle("example")
. Resource lookup in the pt-BR
locale would transparently resolve to the appropriate properties file as follows:
- Lookup of the key
"teacup"
would return the word"xícara"
, as that is the Brazilian Portuguese word defined inexample_pt_BR.properties
, overriding the value"chávena"
inexample_pt.properties
and the value"teacup"
inexample.properties
. - Lookup of the key
"help.label"
would return the string"Ajuda"
, as that label is the same in all Portuguese dialects and is defined inexample_pt.properties
, overriding the value"Help"
inexample.properties
. - Lookup of the key
"ok.label"
would return the label"OK"
, as that is the value defined inexample_pt.properties
and not overridden in either of the other two properties files.
Accessing Resource Bundles
Once you have access to a resource bundle, you can retrieve resources (usually strings) much as you would from a map.
ResourceBundle.containsKey(String key)
- Indicates whether the resource bundle or one of its resolving parents contains a resource value for the given key.
ResourceBundle.getString(String key)
- Returns a string resource for the given key. A
java.util.MissingResourceException
will be thrown if there is no such resource. ResourceBundle.keySet()
- Returns a set of all keys in this resource bundle and its resolving parents.
MessageFormat
You've known how to format message for a while, using String.format(String format, Object... args)
. Behind the scenes String.format(…)
uses java.util.Formatter
, which recognizes a pattern syntax consisting of placeholders such as %s
to indicate a string and %d
to indicate a decimal integer. The formatting logic will replace these placeholders with the given arguments. For example, String.format("Hello, %s. You have %d credits worth %d.", name, balance, value)
would produce "Hello, Jane. You now have 5 credits worth 1,234."
depending on the values in the given variable arguments.
The Formatter
syntax used by String.format(…)
is capable and flexible, but somewhat terse and arcane. Java provides another formatter class java.text.MessageFormat
which uses a different pattern syntax altogether. MessageFormat
patterns use positional parameters, in which an argument index
is placed inside left and right curly bracket {
}
characters U+007B
and U+007D
. Formatting may be performed using MessageFormat.format(String pattern, Object... arguments)
. The above example can thus be reproduced using MessageFormat.format("Hello, {0}. You have {1} credits worth {2}.", name, balance, value)
.
Argument index
The integer placed inside braces such as "{0}"
indicates the index of the following argument series to use as a replacement. Indicating argument index in the parameter specifications is especially helpful in localization, because different languages use different word order. Latin-derived languages usually place adjectives after the noun, for example, and Hindi uses post-positions instead of prepositions. Even without changing languages, the argument index allows your resources to be flexible. You may later decide to change "Hello, {0}. You have {1} credits worth {2}."
to "You have {2} worth of {1} credits, {0}."
, which will still work with the arguments name
, balance
, and value
without any code change, if your resources are stored separate from the code, such as in resource bundle properties files.
Format Type
Format Type | Format Style |
---|---|
number | integer |
currency | |
percent | |
date | short |
medium | |
long | |
full | |
time | short |
medium | |
long | |
full | |
choice | pattern |
In addition to argument index, you can provide a format type
(such as number
) separated from the argument index by the comma ,
character U+002C
. You can further specify a format style
(such as integer
) for a particular format type by separating it with an additional comma.
One of the major benefits of specifying a format type and style is that the MessageFormat
can provide the correct form of values based upon the locale. The value 1234
for example when when formatted for the en-US
locale would appear as $1,234.00
, while the same value formatted for the pt-BR
locale would appear as R$ 1.234,00
.
Choices
You'll note in the pattern string above that the result of formatting "{1,number,integer} credits"
makes perfect sense—unless the specific number of credits is the value 1
. In this case 1 credits
is grammatically incorrect, as the English word credits
is plural. To accommodate such variations, MessageFormat
provides the choice
format type.
If you specify a format type of choice
, the format style you provide will be a subformat string specified in the java.text.ChoiceFormat
class. This subformat is a series of conditions separated by vertical line |
characters U+007C
. After each condition a format pattern is provided, following the number sign #
character U+0023
, for values matching the condition. The example below will produce "no credits", "one credit", or "X credits" based upon whether the number of credits is 0
, 1
, or greater than 1
. Note that the subformat for values greater than zero is identical to the original parameter that ignored singular and plural.
Quotes
If literally curly brackets are to appear within a MessageFormat
pattern, they must be surrounded by matching apostrophe '
characters U+0027
, sometimes referred to as single quotes. Single characters or multiple characters can be quoted, as in "The set '{'3, 7, 11'}' contains prime numbers."
and "The set '{3, 7, 11}' contains prime numbers."
. This approach to escaping introduces one of the most significant confusions into MessageFormat
patterns: if a literal apostrophe is to appear, it too must be encoded by doubling the single quote!
Locale
By default MessageFormat
uses the default system format for the Locale.Category.Format
category. Because MessageFormat
extends java.text.Format
, it inherits a general Format.format(Object obj)
method. If you wish to format a message for a specific locale, you can create an instance of a MessageFormat
directly, indicating a pattern and a locale, and then pass an array of arguments to the format(…)
method.
The following example illustrates saving a string resource in two properties files. The locale is specified directly, for a user who has requested notifications in Brazilian Portuguese, for example.
Rincl
Java resource bundles provide an excellent approach to organizing resources for various locales around a single component, such as the button labels for the main screen or text for the About
box. But resource bundles make little allowance for object-oriented practices in which a base component may define some resources expected to used in a subclass of that component. Moreover there is no native mechanism for compartmentalizing different locales on a single JVM. Once a resource bundle is acquired, the API of actual resource lookup is rather limited.
The Resource I18n Concern Library (Rincl), an open-source project by GlobalMentor, Inc., is meant to provide a standard API for accessing resources, define a standard mechanism for accessing them, and allow custom resource implementations to be plugged in transparently. The main dependency io.rincl:rincl
provides the central API for working with locales and accessing resources. A separate dependency will provide the actual resource storage implementation.
Rincl is especially nimble because it allows different resource configurations on the same JVM. While most applications have only a single user and therefore one current locale, other scenarios such as web servers may be serving various users on the same JVM, each with their own preferred or selected locale. Rincl uses the Concern Separation Aspect Registrar (Csar) library to compartmentalize resource access on the JVM. Your application or library code need simply use the Rincl access methods, and the correct locale and resource implementation will be utilized automatically.
Rincl Locale
By default Rincl will use the default locale of the JVM, which has been set by one of the Locale.setDefault(Locale newLocale)
methods as explained above. But as Rincl allows multiple i18n configurations to exist simultaneously on the same JVM, it is preferred to use Rincl's own locale access methods in the io.rincl.Rincl
class.
The Rincl.setLocale(Locale.Category category, Locale locale)
method sets the current locale for a particular category; Rincl.setLocale(Locale locale)
is a convenience method that sets the locale for all categories. You can retrieve the current locale using the Rincl.getLocale(Locale.Category category)
. In the default case in which only one Rincl configuration exists on the JVM, these methods access to the JVM defaults as you would through the Locale
class. The benefit is that if multiple Rincl configurations are present, these methods automatically and transparently access the locale and resources configured for the context of the calling code.
Rincl Resources
The main interface for accessing resources is io.rincl.Resources
. Once you have access to a Resources
instance, you can retrieve resources of different types. Resources.getInt(String key)
will return an int
value, for example, and Resources.getString(String key, Object... arguments)
will return a string. If no resource can be located for the the requested resource key, a java.util.MissingResourceException
will be thrown.
Acquiring a Resoures
instance is conceptually a two-step procedure:
- Call
Rincl.getResourceI18nConcern()
to return theio.rincl.ResourceI18nConcern
configured for the current context of your code. - Invoke
ResourceI18nConcern.getResources(Class<?> contextClass)
to returnResources
for a certain class using the current locale.
The easy way to accomplish this to simply implement io.rincl.Rincled
for every class needing access to resources. The implementing class may then call Rincled.getResources()
to retrieve a Resources
instance for that class, which will occur behind the scenes using the steps outlined above. The specific Resources
acquired and how they are loaded depends on which Rincl implementation has been configured.
Rincl an Application
The io.rincl:rincl-resourcebundle
package provides an illustration of how Rincl can add i18n support with resources stored in and loaded from resource bundle properties files. The Rincl resource bundle implementation not only knows how to load from resource bundles, it can also automatically resolve resources in properties file for parent classes and implemented interfaces. This allows you to consolidate related resources based upon inheritance hierarchy.
For example assume that there exists a Restaurant
interface for common functionality of a restaurant. The Restaurant
interface provides a set of resources that are likely to be used by a restaurant. The interface provides a Portuguese localization as well.
The TeaBar
class implements Restaurant
, so Rincl will give it access to all the Restaurant
resources. TeaBar
can also provide its own set of resources in addition to those from Restaurant
.
The Teacup
class merely needs to implement Rincled
to have access to all these resources, as in the example below.
Review
Gotchas
- Properties files do not yet support UTF-8. Bytes are interpreted as part of the ISO-8859-1 charset, and characters from higher code points must be escaped.
- Don't forget to double apostrophes in a
MessageFormat
pattern, or some parameters in the string may not be recognized.
In the Real World
- While the XML properties format is technically superior, many still use the original key/value properties format because it is simpler.
- It is easier and more flexible to work with the current locale via Rincl rather than directly accessing the default JVM locale.
Think About It
- Will a particular message be displayed to the user, either directly as a message or indirectly as a menu item? Place the message in resource files.
- Would a particular value be displayed differently in different locales? Provide a format pattern in the resource file and supply the value as a format argument.
Self Evaluation
- What is the difference between internationalization and localization?
- How does a Java
Locale
related to alanguage tag
?
Task
Internationalize your Booker application using Rincl.
- Place all strings to be displayed to the user in resource files.
- Provide alternate messages in another locale of your choice. You may use Google Translate or some other translation service if you do not know any languages other than English.
- At the start of a list of publications for the
list
command, present a message indicating the optional name of the user and the number of publications in the list, e.g.Jane Doe's library contains 1,234 publications
.- Use a format pattern in the resource file.
- Display the publication count using the user's preferred format for formatting numbers based upon the user's locale.
- Remember that the user identification file may not be present; compensate accordingly so that the application would still provide a count of publications.
To provide flexibility to the user, and in order to facilitate testing, provide a --locale
command-line parameter that allows the user to override the system locale when starting Booker. The locale value must be in the form of an IETF language tag. Use Rincl to set the locale if one is specified. You do not yet need to internationalize the help display.
booker list [--locale <locale>] [--name <name>] [--type (book|periodical)]
booker load-snapshot [--locale <locale>]
booker -h | --help
Option | Alias | Description |
---|---|---|
list | Lists all available publications. | |
load-snapshot | Loads the snapshot list of publications into the current repository. | |
--help | -h | Prints out a help summary of available switches. |
--locale | -l | Indicates the locale to use in the program, overriding the system default. The value is in language tag format. |
--name | -n | Indicates a filter by name for the list command. |
--type | -t | Indicates the type of publication to list, either book or periodical. If not present, all publications will be listed. |
See Also
- Internationalization and localization (Wikipedia)
- Trail: Internationalization (Oracle - The Java™ Tutorials)
- Lesson: Isolating Locale-Specific Data (Oracle - The Java™ Tutorials)
References
- Localization vs. Internationalization (W3C Internationalization Activity)
- JDK 8 and JRE 8 Supported Locales (Oracle)
- ISO 639 (Wikipedia)
- ISO 3166 (Wikipedia)
- BCP 47: Tags for Identifying Languages (IETF)
Resources
- Internet Engineering Task Force (IETF)
- International Organization for Standardization (ISO)
- Language codes - ISO 639 (ISO)
- Country Codes - ISO 3166 (ISO)
- W3C Internationalization (i18n) Activity
- IANA Language Subtag Registry
- Resource I18n Concern Library (Rincl)
Acknowledgments
- Some symbols are from Font Awesome by Dave Gandy.