.
Last update: 1997-05-20
9945-2-41 _____________________________________________________________________________ Topic: I18N issues - locales Relevant Sections: 2.5.2.1, 2.8.3.2 Classification: Q1-6: Unaddressed Issues. Q7: Ambiguous Issue. Q8: No Change. Defect Report: ----------------------- (from Andrew Hume Doug McIlroy) Issue B [1] The specification of locales and the interface to them discriminates against non-vendor supplied software. In par- ticular, it is impossible to write a portable implementation of regcomp() and regexec(), as there is no standardised interface to the vital knowledge presumably set up by a call to setlocale(). This knowledge is detailed below; in brief, the first seems an oversight and the others are necessary to use the locale information. ________________________________________ [2] How can membership in class :blank: be determined portably? [2.5.2.1, 2.8.3.2(6)] Proposed Solution: Provide a ctype function isblank(). Rationale: It is inconsistent that this be the only LC_CTYPE cat- egory without a C binding. Note that this extension intro- duces a difference between the C and POSIX locales. ________________________________________ [3] How can the meaning of an arbitrary equivalence class be discovered portably? Proposed Solution: Provide a function that, given any name for an equiva- lence class, returns a list of names of collating symbols in the class. The order of the list shall be the same regard- less of what name is given. Rationale: This is needed if an application, such as a searching or sorting tool, requires this locale-specific information. In particular the regcomp() and sort need it. ________________________________________ [4] How can the meaning/value of an arbitrary collating symbol be determined portably? Proposed Solution: Provide a function that, given a collating symbol, returns the representation and length of the symbol. Rationale: This is needed if an application, such as a searching or sorting tool, requires this locale-specific information. ________________________________________ [5] How can the collating elements in a string be found and compared portably? Proposed Solution: Provide a function that returns the length and the weight vector for the collating element at the beginning of the string. Rationale: This is needed if an application, such as a searching or sorting tool, requires this locale-specific information. ________________________________________ [6] How can regcomp() expand a range expression into a list of collating elements portably? Proposed Solution: Provide a successor function that, given the name of a collating element, returns the name of the collating element with the next larger weight vector. For this purpose two elements with the same weight vector compare in the order of their equivalence listing. Rationale: This is needed if an application, such as a searching or sorting tool, requires this locale-specific information. It may further be useful to have a way to inquire whether a locale contains any multicharacter collating elements. ________________________________________ [7] Lines 2918-20 say that an equivalence class expression that names a collating element not in an equivalence class shall be treated as a collating symbol. Does this statement affect the meaning of ``collating sym- bol'' in line 3306? Does it eliminate such equivalence class expressions from consideration in lines 2943-5? Proposed Solution: Change 2918-2920 to say ``the expression shall be understood as an equivalence class that contains only the one collating element.'' We would actually prefer the admittance of singleton equivalence classes in the definitions of 2.5.2.2. Rationale: This question affects the meaning of range expressions. Lines 2918-20 could be construed as forcing [[=CE1=]- [=CE2=]] to mean [[.CE1.]-[.CE2.]] in some cases, although the former expression looks syntactically incorrect. The preferred solution agrees with customary mathematical usage, and clarifies the behavior of the equivalence-class function proposed in [3] above. ________________________________________ [8] What if collation changes between regcomp() and regexec()? Proposed Solution: The result is undefined. Rationale: For the common case of locales in which all collating elements are single characters, regcomp() should be allowed to compile character classes. At the same time, regexec() should be allowed to handle multicharacter collating sym- bols. The proposed resolution assures that both desiderata are met. WG15 response for 9945-2:1993 ----------------------------------- Q1 The standard does not speak to this issue and no conformance distinction can be made between alternative implementations based on this. The standard does not require that an implementation conforming to the standard be portable. Therefore, there is no requirement that the functionality be specified by the standard. Concerns are being forwarded to the sponsor. Q2,Q3,Q4,Q5,Q6 The standard does not speak to these issues and no conformance distinction can be made between alternative implementations based on this. Concerns are being forwarded to the sponsor. Q7 The standard is unclear on this issue, and as such no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor. Q8 The standard states the required behavior and conforming implementations shall conform to this. According to P.2 pg 729 line 367-368, the standard specifies the result is undefined. Rationale for Interpretation: ----------------------------- None. _____________________________________________________________________________