.
Last update: 1997-05-20
9945-2-29 Class: No change _____________________________________________________________________________ Topic: regular expressions Relevant Sections: 2.8 Defect Report: ----------------------- Please provide an interpretation of the following taken from Section 2.8 of ISO/IEC 9945-2:1993. I think I know what the specified behavior is for the following cases, but maybe I've opened an interesting question or two. Given a locale in which "ch" is a multiple character collating element that collates between "c" and "d", then certainly [[.ch.]] matches "ch". This makes it pretty clear that [^[.ch.]] doesn't match "ch" (and not even just the "c"). Therefore, consistency argues that [^c] matches "ch" And, of course, [c] doesn't match "ch" (and not even just the "c"). If we're in agreement so far, then the simple rule is that if the string to check against a bracket expression can be taken as a multiple character collating element, then the matching process must do so. I'm pretty sure about the above. What I'm not so sure about is the behavior for character classes. Take, for example, [[:alpha:]] when presented with "ch". The rationale for POSIX.2 confirms that ``character classes are not intended to include collating elements''. However, there are still two possible answers: "ch" doesn't match, and the "c" of "ch" matches. I like neither of these answers; neither fits my intuitive belief that "ch" should match as a unit. Even worse, the nonportable [a-z] *does* match the unit "ch"! What is actually specified for [[:alpha:]] here? WG15 response for 9945-2:1993 ----------------------------------- A character class expression is defined in section 2.8.3.2 of the standard, as a set of characters belonging to a character class, as defined in the LC_CTYPE category of the current locale. A range expression is defined in the same section as a set of collating elements that fall between two elements in the current collation sequence, inclusive. Thus, a collating element ch, which is not a character, would be matched by the range expression [a-z], but not by the character class (set of specific characters specified in the locale file) [:alpha:]. [:alpha:] would match the 'c' and the 'h' individually, for the same reason that the expression [c] matches the 'c' in ch, but not the collating element ch. Rationale for Interpretation: ----------------------------- None. _____________________________________________________________________________