OpenI18N Locale Name Guideline [Version 1.1 -- 2003-03-11] Background: The purpose of this guideline is to provide guidance to developers of software that implement or use locale based internationalization functionality. One of the problems with existing locale implementations is that names of locales vary with the software. Several different names have been assigned to a single locale. For example, Glibc, X Window System and Java all keep their own locale information. Sometimes an application has its own set of supported locale names and fails to run if the specified locale name is different from the names of which the application knows. Once users have set their locale environment with environment variables such as LANG, it is expected that any internationalized application will work in the same environment. To cope with this situation, this guideline introduces and specifies the Standard Locale Names. The Standard Locale Names are the names that should be implemented by any software supporting the locale, so that the names can be specified reliably by users or other software. By using the Standard Locale Names, a single set of locale names can be used for any software environment. User/implementation-defined names are the names that refer to different entities from any of the Standard Locale Names or any constituent part of the Standard Locale Names. They are given different rules from the Standard Locale Names to make a distinction. These rules should be applied if new non-Standard Locale Names are created. Scope: This document is a supplemental document to OpenI18N 1.3 Globalization Specification. Both operating systems conforming to OpenI18N 1.3 Globalization Specification and application software written for OpenI18N 1.3 conforming systems should follow this guideline. This document has no requirement for user of conforming implementation of OpenI18N 1.3 regarding the environment variable setting. Interpretation of environment variable should be handled by conforming implementation based on the alias table provided with this document, then user/implementation-defined values of locale name, e.g. existing name which does not conform to this guide, will be mapped to standard locale name internally in the conforming implementation of OpenI18N 1.3. References: [ISO 639] ISO 639-2:1998 Codes for the representation of names of languages -- Alpha-3 code [ISO 3166] ISO 3166-1:1997 Codes for the representation of names of countries and their subdivisions -- Part 1: Country codes [RFC 2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. Terminology: The key words "shall", "should" and "may" in the following part of this document are to be interpreted as described in OpenI18N 1.3 Globalization Specification. Structure: Standard Locale Names except "C" and "POSIX" shall consist of the following four fields: LANGUAGE TERRITORY CODESET MODIFIERS The MODIFIERS field is optional for Standard Locale Names and may be omitted. The Standard Locale Name shall be a character string consisting of characters from the following character repertoire: a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 - _ . @ , = Hereafter the characters '_', '.' and '@' are called DELIMITERS. The character '-', ',' and '=' are called SPECIALS. The group of characters '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' are called NUMBERS. The allowable characters for a locale name except DELIMITERS, SPECIALS and NUMBERS are called LETTERS. The LANGUAGE, TERRITORY, CODESET and MODIFIERS fields shall be ordered as: LANGUAGE_TERRITORY.CODESET@MODIFIERS The '_' shall be used as a delimiter between the LANGUAGE field and the TERRITORY field. The '.' shall be used as a delimiter between the TERRITORY field and the CODESET field. The '@' shall be used as a delimiter between the CODESET field and the MODIFIERS field. The '@' shall not appear in the locale name if the MODIFIERS field is omitted. All of the fields (i.e. LANGUAGE, TERRITORY, CODESET and MODIFIERS) shall be treated as case sensitive. LANGUAGE: A character string for the LANGUAGE field represents the human language used in the locale. It shall consist of LETTERS only. If ISO 639-1 defines a two-letter language code for a language, that value shall be used as the standard value for the language. In this case, the standard value shall consist of two lowercase letters. If ISO 639-1 does not define a two-letter language code for a language, a code defined by ISO 639-2 for the language shall be used as the standard value. In this case, the standard value shall consist of three lowercase letters. If neither a two-letter language code from ISO 639-1 nor a three-letter language code from ISO 639-2 is available for a language, no standard value is defined for the language. In order to not conflict with future extensions to the ISO 639 series standards, user/implementation-defined values for the LANGUAGE field shall include uppercase LETTERS or consist of more than three letters. TERRITORY: A character string for the TERRITORY field represents the geographical territory (country or region) indicated by the locale. It shall consist of LETTERS only. If ISO 3166-1 defines a two-letter region/country code for a territory, that value shall be used as the standard value for the territory. In this case, the standard value shall consist of two uppercase letters. If a two-letter code is not available in ISO 3166-1 for a territory, no standard value is defined for the territory. In order to not conflict with future extensions to ISO 3166-1, user/implementation-defined values for the TERRITORY field shall include lowercase letters or consist of more than two letters. CODESET: A character string for the CODESET field represents the coded character set used in the locale. The standard values for the CODESET field shall consist of multiple strings exclusively containing LETTERS or NUMBERS in conjunction with the delimiter '-'. The syntax of the field in ABNF [RFC 2234] is: CODESET = STRING1 *( "-" STRING2 ) STRING1 = 1*LETTERS STRING2 = 1*(LETTERS / NUMBERS) STRING1 shall consist of uppercase LETTERS only. STRING2 shall consist of uppercase LETTERS, NUMBERS, or both. The following is a list of examples of standard values for the CODESET field. "UTF-8", "ISO-8859-1", "ISO-8859-2", "ISO-8859-5", "ISO-8859-7", "ISO-8859-9", "ISO-8859-13", "ISO-8859-15", "GB-2312", "GB-18030", "EUC-KR", "EUC-JP", "EUC-TW" [Note: The whole list of standard values for the CODESET field and alias values of each standard value will be available at http://www.openi18n.org/localenameguide/ ] In order not to conflict with the standard values for the CODESET field, user/implementation defined values should include either lowercase LETTERS or NUMBERS in STRING1. MODIFIERS: A character string for the MODIFIERS field represents additional information for the localization data of a locale. It shall consist of one or more OPTION fields separated by the character ','. Each OPTION field consists of a KEYWORD, optionally followed by the character '=' and OPTIONVALUE. The character string for the KEYWORD sub-field shall consist of LETTERS and NUMBERS. The character string for the OPTIONVALUE sub-field shall consist of LETTERS, NUMBERS, and the character '-'. No standard values are defined for the MODIFIERS field. The following is a list of examples of KEYWORDs and their meanings: "euro" for Euro currency "im" for input method option, e.g., "im=INPUT-METHOD-NAME" Security: Applications that accept locale information from untrusted users must filter locale values, since arbitrary locale values may be able to exploit some tools and libraries (e.g., including "/" in their names). [End]