Document #: | D2558R0 |
Date: | 2022-03-11 |
Project: | Programming Language C++ |
Audience: |
SG16 SG22 EWG |
Reply-to: |
Steve Downey <[email protected]> |
WG14, the C Standardization committee, is adopting [N2701] for C23. This will add U+0024 $ DOLLAR SIGN, U+0040 @ COMMERCIAL AT, and U+0060 ` GRAVE ACCENT to the basic source character set. C++ should adopt the same characters for C++26.
These characters are available in all encoded character sets in common use and everyone assumes that they are available, using them freely in source text. The primary change would be that these characters become available for syntactic purposes. Although using $ in identifiers is a common extension, they were not added to the identifier set in C, and this paper does not propose adding them either. Nor were trigraphs added in C for these characters, and this paper does not propose additional trigraphs or digraphs be added.
The translation model for C makes adding these to their basic source character set, the encoded set for source code before translation, much more compelling. These characters being already in the translation character set as single byte characters makes this less important for C++. Nonetheless, it would be useful to make these available for language purposes as the more conservative C language has agreed there are no functional impediments to their use.
Corentin Jabot discusses the usage in other programming languages extensively in [P2342R0], For a Few Punctuators More, q.v.
These changes are relative to [N4901] “Working Draft, Standard for Programming Language C++”
Modify [lex.charset] as follows:
2 The basic character set is a subset of the translation character set, consisting of 9699 characters as specified in Table 1.
Modify [tab:lex.charset.basic] with the following additions:
U+0009 CHARACTER TABULATION
U+000B LINE TABULATION
U+000C FORM FEED
U+0020 SPACE
U+000A LINE FEED new-line
U+0021 EXCLAMATION MARK !
U+0022 QUOTATION MARK "
U+0023 NUMBER SIGN #
U+0025 PERCENT SIGN %
U+0026 AMPERSAND &
U+0027 APOSTROPHE '
U+0028 LEFT PARENTHESIS (
U+0029 RIGHT PARENTHESIS )
U+002A ASTERISK *
U+002B PLUS SIGN +
U+002C COMMA ,
U+002D HYPHEN-MINUS -
U+002E FULL STOP .
U+002F SOLIDUS /
U+0030 .. U+0039 DIGIT ZERO .. NINE 0 1 2 3 4 5 6 7 8 9
U+003A COLON :
U+003B SEMICOLON ;
U+003C LESS-THAN SIGN <
U+003D EQUALS SIGN =
U+003E GREATER-THAN SIGN >
U+003F QUESTION MARK ?
U+0041 .. U+005A LATIN CAPITAL LETTER A .. Z A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
U+005B LEFT SQUARE BRACKET [
U+005C REVERSE SOLIDUS \
U+005D RIGHT SQUARE BRACKET ]
U+005E CIRCUMFLEX ACCENT ^
U+005F LOW LINE _
U+0061 .. U+007A LATIN SMALL LETTER A .. Z a b c d e f g h i j k l m
n o p q r s t u v w x y z
U+007B LEFT CURLY BRACKET {
U+007C VERTICAL LINE |
U+007D RIGHT CURLY BRACKET }
U+007E TILDE ~
[N2701] Philipp Klaus Krause. @ and $ in source and execution character set.
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2701.htm
[N4901] Thomas Köppe. 2021-10-22. Working Draft, Standard for Programming Language C++.
https://wg21.link/n4901
[P2342R0] Corentin Jabot. 2021-03-25. For a Few Punctuators More.
https://wg21.link/p2342r0