Diposition of comments on the second public review =================================================== March, 2002 =========================================================================== | Date: 21 Jan 2002 23:46:35 +0800 | From: Roger So | Subject: [localenameguide:5] BUGS in LI18NUX Locale Name Guideline Thank you very much for sending your comments. | Hello, | | The Guideline itself is good, and I see no problems. However, 2 things | regarding the Codeset Alias Table: | | 1. It would be nice if it can explain what does the "TCA" in "TCA-Big5" | mean. Otherwise you'll get a lot of angry Taiwanese users asking | why it's "TCA-BIG5" instead of "BIG5". :) [Accepted] The explanation of TCA has been added to the Codeset Alias Table. | | 2. Since Big5-HKSCS has been registered with IANA, I think Big5-HKSCS's | %MIME% column should say "Big5-HKSCS" instead of "*". [Accepted] The %MIME% column reflects the comment string "(preferred MIME name)" in the IANA character set registry. "%MIME%" has been changed to "%Preferred MIME name%. =========================================================================== | Date: Tue, 22 Jan 2002 18:53:06 +0200 | From: Alexander Bokovoy | Subject: [localenameguide:7] BUGS in LI18NUX Locale Name Guideline Thank you very much for sending your comments. | The LI18NUX Standard Codeset Name Alias Table misses row for CP-1251 encoding used for | Bulgarian and Belarusian locales under Linux and for variants of Ukrainian and Russian | locales (bg_BG.CP1251, be_BY.CP1251, ru_RU.CP1251, uk_UA.CP1251). | | CP-1251 CP1251 windows-1251 * Cp1251 [Accepted] CP-1251 has been added to the Codeset Alias Table. =========================================================================== | Date: Wed, 23 Jan 2002 00:42:06 -0600 | From: David Starner | Subject: [localenameguide:9] BUGS in LI18NUX Locale Name Guideline Thank you very much for sending your comments. | As a Debian maintainer, I have a number of problems with the codeset | alias table. | | 1. Despite the Q&A, I still don't understand why a whole new set of | names needs to be created. Big5 is standard - TCA-BIG5 is not and it will | get ignored, because it's more painful to write and TCA is just an | arbitrary string of characters to most people. I've never before seen | KOI-8-R - it's KOI8-R. IANA offers a commonly used set of names that are | at least as consistent as the one's you're offering us, and that are | actually in current use. [Rejected] We've received the same comments several times from LI18NUX members and reviewers at the public reviews. We understand that this is a common concern about the guideline. First, we have no intention to force people to stop using existing names. You can continue to use the names that you are currently using through alias mechanisms. And this guideline does not prohibit anyone from using non-standard names. Some systems may need to implement the Standard Locale Names in addition to the existing names. Having a single set of standard names is the important first step. In the long run, it is desired that names converge on Standard Locale Names. | | 2. The IANA codeset list needs to be rechecked. BIG5-HKSCS, ISO-8859-13, | ISO-8859-15, and TIS-620 are registered. [Accepted] The Codeset Alias Table has been updated to reflect the latest IANA codeset list. | | 3. VISCII is unusable as a locale charset in recent versions of glibc, | since it puts graphic characters in C0. Any charset wherein \x00-\x7f | don't map directly to U+0000-U+007F can't be used with glibc. To | circumvent that would be dangerous - many Unix programs rely on having | those characters be ASCII. [Rejected] We will keep VISCII in the Codeset Alias Table, However, we share your concern regarding the glibc implementation. We will add some description about this limitation to the NOTE column of VISCII part. Glibc provides Vietnamese locales with UTF-8 as their codesets. | | 4. What's the need for a bunch of IBM codepages? Glibc 2.1 only supports | 2 as locale charsets - CP1251 for be_BY and bg_BG and CP1255 for yi_US. | If they aren't currently in use, they aren't something that should be | encouraged for use on Linux - users should be encouraged to move to | UTF-8, or at least some codeset that keeps C0 and C1 clear like a proper | Unix codeset. [Rejected] They are supported in Java locales and registered in the IANA character set registry. We start the alias table from a small set but we will add more names on request. The registration process will be established and managed by FSG/LANANA. | | 5. This list fails to include several charsets that are currently | supported under recent versions of glibc: ISO-8859-14, CP1251, CP1255, | KOI8-T and GEORGIAN-PS. While not officially listed, there have been | several Romainian requests for a ro_RO.ISO-8859-16 locale, and it is | reported to work correctly with recent versions of glibc. [Accepted] ISO-8859-14, CP-1251, CP-1255, KOI-8-T, GEORGIAN-PS and ISO-8859-16 have been added to the alias table. =========================================================================== | Date: 23 Jan 2002 13:16:10 +0200 | From: Pavel Mihaylov | Subject: [localenameguide:11] BUGS in LI18NUX Locale Name Guideline Thank you very much for sending your comments. | | Hello, | | as I reviewed the codeset name and alias table I noticed that Microsoft | CP1251 is not mentioned. This character set is preferred for locales | bg_BG and be_BY, and second preferred one for ru_RU (possibly also for | other languages written in Cyrillic script). Its standard name could be | CP-1251 (as the other CP-XXX ones), glibc name is CP1251 and MIME name | is windows-1251. [Accepted] CP-1251 has been added to the Codeset Alias Table. =========================================================================== | Date: Wed, 23 Jan 2002 14:53:51 -0800 (PST) | From: Ienup Sung | Subject: [localenameguide:13] BUGS in LI18NUX Locale Name Thank you very much for sending your comments. | | Hello, | | This is not really a bug report but would like to have a clarification on | why the guideline has such a rigid/restrictive ABNF syntax at the | CODESET: | | 126 CODESET = STRING1 *( "-" STRING2 ) | 127 STRING1 = 1*LETTERS | 128 STRING2 = 1*(LETTERS / NUMBERS) | | 129 STRING1 shall consist of uppercase LETTERS only. | 130 STRING2 shall consist of uppercase LETTERS and NUMBERS. | | It appears the guideline is limiting too much and hence loosing | possibilities of different kind of std codeset names that could be | needed in future. (Was that intentinoal?) [Rejected] The restrictive rule is intentional. As described in the Q&A, you can continue to use existing names. As for new names, we don't think many new codesets will be created in the future. It is encouraged to move to UTF-8. | | Also, if possible, I would like to propose to change the above lines #126 to | #130 to the following single line. | | 126 CODESET = 1*( LETTERS / NUMBERS / "-" / "_" / "." ) | | so that the std codeset name can have more liberal form of codeset names. [Rejected] To solve existing problems in having multiple names for the same codeset, we need a restrictive rule. Allowing many variations makes it hard to have a single set of standard codeset names and it will go out of control. We believe that the current rule is suitable for having single set of standard names. | | In the similar context, I would like to propose to add '.' and '_' at | the MODIFIERS' KEYWORD and OPTIONVALUE sub-fields so that the lines from | #152 to #154 will be like the following: | | 151 The character string for the KEYWORD and OPTIONVALUE sub-fields shall | 152 contain LETTERS, NUMBERS, and characters such as '-', '_', and '.'. [Rejected] Since we see no strong requirements on standardizing MODIFIER strings, it might not be necessary to be so restrictive as in the specification. However, for the sake of future standardization needs, the similar rule was applied here. =========================================================================== | Date: Thu, 24 Jan 2002 13:29:07 +0000 (GMT) | From: Aidan Kehoe | Subject: [localenameguide:16] BUGS in LI18NUX Locale Name Guideline | Thank you very much for sending your comments. All of your editorial comments have been accepted. Thanks a lot. | | Hi. Commentary was wanted, I believe :-) . | | > 1 Locale Name Guideline [DRAFT FOR PUBLIC REVIEW -- 2002-01-21] | > | > 2 Background: | > | > 3 The purpose of this guideline is to provide guidance to developers | > 4 of software that implement or use locale based internationalization | > 5 functionalities. | | I don't think that pluralizing `functionality,' here makes sense. [Accepted] "functionalities" has been replaced by "functionality". | | > One of the problems of existing locale implementations | > 6 is in that names of locales vary with the software. | | The word `in,' isn't needed there. [Accepted] "in" has been removed. | | > Several different | > 7 names are assigned to a single locale. | | `have been assigned to a single locale,' ? Because l18nux should | change this for the present and the future. [Accepted] "are assigned" has been replaced by "have been assigned". | | > For example, Glibc, X Window | > 8 System and Java keep their own locale information. | | `all keep their own locale information,' I would put. [Accepted] "all" has been inserted before "keep". | | > 9 Sometimes an application has their own set of supported locale names | > 10 and fails to run if the specified locale name is different from the | > 11 names of which the application knows. | | `... an application has its own set ...' And maybe `current locale,' | instead of `specified locale,' ? [Accepted] "their own set" has been replaced by "its own set". "specified locale" is remain unchanged since current locale is usually 'C' when no setlocale function is called. | | > Once users set their locale | > 12 environment by environment variables such as LANG, it is expected that | > 13 any internationalized application will work in the same environment. | | `Once users have set ... _with_ environment variables,' [Accepted] "users set" has been replaced by "users have set". "by" has been replaced by "with". | | > 14 To cope with this situation, this guideline introduces and specifies | > 15 the Standard Locale Name. | | The guidline introduces & specifies more than one `Standard Locale | Name,' ; therefore, `Standard Locale Names,' is more appropriate here, | I think. [Accepted] "Standard Locale Name" has been replaced by "Standard Locale Names". | | > The Standard Locale Name is the name | > 16 that should be implemented on any software supporting the locale, | | `implemented by,' I think, but maybe you mean to say something else. [Accepted] "on" has been replaced by "by". | | > 17 so that the name can be specified reliably by users or other software. | > 18 By using the Standard Locale Names, a single set of locale names can be | > 19 used for any software environment. | > | > 20 User/implementation-defined names are the names that refer to different | > 21 entities from any of Standard Locale Names or any constitutional part | | `constituent part,' [Accepted] "constitutional" has been replaced by "constituent". | | > 22 of Standard Locale Names. They are given different rules from the | > 23 Standard Locale Names to make distinctions. The rules should be applied | | `... to make a distinction.' and `These (different) rules should be | applied ... ' [Accepted] "make distinctions" has been replaced by "make a distinction". "The rules" has been replaced by "These rules". | | > 24 if new non-Standard Locales are created. | > | > 25 Scope: | > | > 26 This document is a supplemental document to the LI18NUX2000 | > 27 Globalization Specification. Both operating systems conforming to | > 28 the LI18NUX2000 Globalization Specification and application software | > 29 written for the LI18NUX2000 conforming systems should follow this | | No article there; thus `written for LI18NUX2000,' would be better. [Accepted] "the" has been removed from "the LI18NUX2000". | | > 30 guideline. | > | > 31 References: | | [deletia] | | > 67 a b c d e f g h i j k l m n o p q r s t u v w x y z | > 68 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z | > 69 0 1 2 3 4 5 6 7 8 9 | > 70 - _ . @ , = | > | > 71 Hereafter the characters '_', '.' and '@' are called DELIMITERS, | > 72 character repertoire consists of '-', ',' and '=' are called | | `... any character of '-', ',' or '=' is called a SPECIAL,' [Accepted] "character repertoire consists of '-', ',' and '=' are called" has been replaced by "the character '-', ',' and '=' are called SPECIALS.". | | > 73 SPECIALS, the character repertoire consists of '0', '1', '2', '3', | > 74 '4', '5', '6', '7', '8', and '9' are called NUMBERS, and the allowable | | `... the group of characters '0', 1','2','3','4','5','6','7','8','9' | are called the NUMBERS,' [Accepted] "the character repertoire consists of '0', '1', '2', '3', '4', '5', '6', '7', '8', and '9' are called NUMBERS," has been replaced by "the group of characters '0', 1','2','3','4','5','6','7','8','9' are called NUMBERS,". | | > 75 characters for a locale name except DELIMITERS, SPECIALS and NUMBERS | | `locale name characters,' rather than `characters for a locale name,' | | > 76 are called LETTERS. | > | > 77 The LANGUAGE, TERRITORY, CODESET and MODIFIERS fields shall be | | [deletia] | | > 140 [Editor's Note (to be removed from the final text): | > 141 The URL is not decided yet.] | > | > 142 In order not to conflict with the standard values for the CODESET | > 143 field, user/implementation-defined values should include either | > 144 lowercase LETTERS or NUMBERS in the STRING1. | | `in STRING1,' would be better. And maybe `user or implementation | defined,' rather than `user/implementation defined,' ? [Accepted] "in the STRING1" has been replaced by "in STRING1". The expression "user/implementation-defined values" has not been changed. | | > 145 MODIFIERS: | > | > 146 A character string for the MODIFIERS field represents additional | > 147 information for localization data of a locale. It shall consist of | | `for the localization data ...' [Accepted] "for localization data" has been replaced by "for the localization data". | | > 148 one or more OPTION fields separated by the character ','. | > | > 149 Each OPTION field consists of a KEYWORD, optionally followed by | > 150 the character '=' and OPTIONVALUE. | > | > 151 The character string for the KEYWORD sub-field shall consist of | > 152 LETTERS and NUMBERS. | > | > 153 The character string for the OPTIONVALUE sub-field shall consist of | > 154 LETTERS, NUMBERS, and the character '-'. | > | > 155 No standard values are defined for the MODIFIERS field. | > | > 156 The following is a list of examples of KEYWORDs and their meanings: | > | > 157 "euro" for Euro currency | > 158 "im" for input method option, e.g., "im=INPUT-METHOD-NAME" | > | > 159 [Editor's Note (to be removed from the final text): | > 160 The examples above are just for showing the usage of | > 161 KEYWORD field.] | > | > 162 [End] | | | All in all, the document is good, useful and necessary. Keep up the | good work. | | - Aidan Kehoe =========================================================================== | Date: 25 Jan 2002 22:53:14 +0200 | From: Alexander Shopov | Subject: [localenameguide:18] BUGS in LI18NUX Locale Name Guideline Thank you very much for sending your comments. | | Dear Sirs, | I am going to comment on the second DRAFT FOR PUBLIC REVIEW 2002-01-21 | of LI18NUX Locale Name Guideline from the point of Bulgarian users. | | In Bulgaria the most popular 8bit encoding is CP1251. It is used both by | Windows and by Unix-type OS users. It is the only encoding used for | Internet pages and the only one used for e-mail messages. This is a | Cyrillic encoding. | | Other 8bit encodings are extremely unpopular. KOI types that are popular | in former Soviet countries are practically known only to Unix users, but | they do not use them. ISO-8859-5 is absolutely unpopular. The second | place after CP1251 is held either by an old DOS encoding MIK, or other | encodings that were used ad hoc. I cannot provide more data, as CP1251 | is used practically in 100% of the cases. | | As a result Linux users use the locale bg_BG.CP1251. CP1251 is the name | of the encoding that was first added to GNU C library (glibc). Later | versions of glibc added the alias WINDOWS-1251. Still - *noone* here | uses the locale name bg_BG.WINDOWS-1251. | | The installed base of Linux users here is quite large and the changing | of CODENAME in locale specification is a *major* inconvenience. | | My proposals: | 1. Locale Name Guideline should be changed to allow names like CP1251. | | Line | 127 STRING1 = 1*LETTERS | Should be changed to | 127 STRING1 = 1*(LETTERS / NUMBERS) | | Line | 129 STRING1 shall consist of uppercase LETTERS only. | Should be changed to | 129 STRING1 shall consist of uppercase LETTERS and NUMBERS. [Rejected] We have no intention to force people to stop using existing names. You can continue to use the names that you are currently using through alias mechanisms. And this guideline does not prohibit anyone from using non-standard names. Having a single set of standard names is the important first step. In the long run, it is desired that names converge on Standard Locale Names. | | 2. The LI18NUX Standard Codeset Name alias table should have an entry | for our encoding: | | %Standard Name% CP1251 | %glibc charset names% CP1251, WINDOWS-1251 | %MIME% WINDOWS-1251 | %other IANA% * | %Java% * | NOT-RECOMMENDED names% * | [Accepted] CP-1251 has been added to the Standard Codeset Table. | Best regards: | al_shopov | =========================================================================== | Date: Fri, 1 Feb 2002 17:15:36 +0100 (CET) | From: ekj@ekj.vestdata.no | Subject: [localenameguide:20] BUGS in LI18NUX Locale Name Guideline Thank you very much for sending your comments. | | Dear Sirs ! | | I read with interest your public review draft of the locale name | guideline. | | In the guideline, your locale is supposed to be entered as: | | LANGUAGE_TERRITORY.CODESET@MODIFIERS | | I am somewhat concerned that this organization seems a bit of a | force-fit to the situation in Norway. (and possible other countris, I | don't know) | | The situation here is this: | | * Norwegian is *one* language, this language has the ISO-639-1 | two-letter-code "no" | | * Norwegian has *two* official written forms, "bokmål", and "nynorsk". | | * The separation between who writes in which form does not follow any | obvious geographical dividing-line. (and indeed migth fluctuate over | time) | | * Both written forms use the samee character-set (iso-latin-1). | | Based on your guideline, what would be an apropriate locale-string for a | user of nynorsk, and what would be for a user of bokmål ? [Rejected] The languages are different, please use different codes for them. If both language and territory are the same, you can use @MODIFER for specifying different set of localization data. | | | Sincerely, | Eivind Kjrstad | =========================================================================== | Date: Fri, 01 Feb 2002 17:31:26 -0500 | From: David Wheeler | Subject: [localenameguide:21] BUGS in LI18NUX Locale Name Guideline -- Need Security Guidelines Thank you very much for sending your comments. | | First, congrats on releasing the draft | "Locale Name Guideline" (2002-01-21). | I'm glad to see this progress. | | However, I'm concerned that the security ramifications | of locales aren't mentioned in the document. | Many tools and libraries that deal with locales are | easily fooled into doing terrible things if locales have | arbitrary text in them. Thus, I recommend adding the | following to the end as a new section. I recommend giving | specific patterns using regular expressions, so that | people who are implementing checks can simply cut & paste it | into regexp libraries (which are WIDELY available in practically | all programming languages now). If you don't make it obvious | to implementors that they need to check, and exactly what to check | for, we may have more vulnerable systems. | | | | Security: | | Applications that accept locale information from | untrusted users must filter locale values, since | arbitrary locale values may be able to exploit some | tools and libraries (e.g., including "/" in their names). | | Applications should at least require non-empty locale values | to match the following regular expression (as defined by | POSIX's extended regex(3)): | ^[A-Za-z][A-Za-z0-9_,\+@\-\.=]*$ | | Applications that wish to force locales to match these | guidelines precisely can use this more restrictive pattern, | but note that this pattern does not accept the alternative | CEN locale system (which uses the plus sign): | | ^[A-Za-z]+(_[A-Za-z]+)? | (\.[A-Z]+(\-[A-Z0-9]+)*)? | (\@[A-Za-z0-9]+(\=[A-Za-z0-9\-]+) | (,[A-Za-z0-9]+(\=[A-Za-z0-9\-]+))*)?$ [Accepted] Thanks for your good suggestion. We've added Security section to the guideline. Because of locale-dependent behavior of RE's range expression, we did not include the regular expression. | | --- David A. Wheeler | dwheeler@ida.org | =========================================================================== | Date: Fri, 01 Feb 2002 17:34:12 -0500 | From: David Wheeler | Subject: [localenameguide:22] BUGS in LI18NUX Locale Name Guideline - Examples Thank you very much for sending your comments. | | There needs to be some examples in the guidelines for | complete locales. Please add something like this: | | | | Examples: | | Here are a few examples of locales: | | en - English. | en_US - English, U.S. locale/spelling (e.g., "color" not "colour"). | fr - French | [Accepted] Some examples have been added to Q&A text. | | | (I'm sure you can think of some wild examples that show all | the options). | | | --- David A. Wheeler | dwheeler@ida.org | ===========================================================================