|
CLDR (Common Locale Data repository) Process for Common Locale Data Collection, Vetting and ReleaseBaldev S. Soor1. Data Collection ProcessWhen gathering data for a country and language, it is important to have multiple sources for that data in order to weed out any bias. Contributions will be invited. A separate document details the format of the contributions. Contributors can be individuals or organizations. The ICU locale data will be taken as the initial source. The goal is to come up with a reasonable set of data within a short time; the expectation is that it will be modified and improved, in successive versions, by more input from the open-source community and experts resident in the countries. When using existing data, we may have to extrapolate from the available sources because there may not be a direct match between the LDML and that source. Members are encouraged to use local contacts to help with the extrapolation. 2. Data Scrubbing ProcessOnce data for a country and language has been received, the data from the different sources will be compared to show agreements and differences. The data differences will be resolved. 2.1 Resolution ProcedureData contributed to the group from different sources may be in conflict. For example, a contribution on abbreviated month names may show each abbreviated name ending with a period and another contribution for the same abbreviated month names may not show the trailing period. A resolution process will be used to resolve these conflicts. Note that there are two types of data in the repository: a) Contributor specific data: The contributor can be an individual or an organization. The group will not make any changes to the data. Changes to the data are up to the contributing party. The only request is that all changed data be versioned, and the Version Numbering Scheme be used. b) Common Data: This is decided by the group. Normally this would be by consensus of the members attending the regular meetings using the following process:
Members are encouraged to use local language and country contacts, inside and outside their organization, to help vet current common data and any new proposals for addition or amendment of common data. In particular, national standards organizations are encouraged to be involved in the data vetting process. All people involved in vetting data should compare any proposed changes against the data in the comparison charts at http://oss.software.ibm.com/cvs/icu/locale/all_diff_xml/ and indicated which platforms the proposed changes align with, or whether they are different than all of the platforms. 2.2 PrioritizationIn anticipation that there may be conflicting common practices or standards for a given country and language, we will use keyword variants to reflect the different practices. For example, for German we will distinguish between PHONEBOOK and DICTIONARY collation. When there is an existing national standard for a country, the goal is to follow that standard as much as possible. Where the common practice in the country deviates from the national standard, or if there are multiple conflicting common practices, or options in conforming to the national standard, or conflicting national standards, such differences in the common data repository will be distinguished by keyword variants or variant locales. Where a data item is identified as following a particular national standard (or other reference), the goal is to keep that data aligned with that standard. There is, however, no guarantee that data will be tagged with any or all of the national standards that it follows. 3. Data Release Process3.1 Version Numbering SchemeThe locale data is frozen per version. Once a version is released, it is never modified. Any changes, however minor, will mean a newer version of the locale data being released. The versioning scheme is x.y.z, where z is incremented for bug fixes, y is incremented for any significant additions (such as new locale data), and x is incremented for any change in format (such as the addition of new elements in LDML). The initial version number will be 1.0.0 3.2 Release scheduleAn early release of a version of the common data will be issued as an ALPHA release. This will be followed by a BETA release, three months later. The FINAL version will be released three months after the BETA release. For the Version 1.0.0. release, the first release, an accelerated schedule will be followed. The ALPHA release is due September 30/2003, BETA is due October 22/2003 and the FINAL release slated for November 15/2003. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2003 Open Internationalization Initiative (OpenI18N). All rights reserved. |