ISO/IEC JTC1 SC22 WG21 N2647 = 08-0157 - 2008-05-16
Alisdair Meredith,
[email protected]
Hans Boehm,
[email protected]
Lawrence Crowl,
[email protected],
[email protected]
Peter Dimov,
[email protected]
Daniel Krügler,
[email protected]
This paper is a modification of
N2534 = 08-0044 - 2008-03-17.
The changes include summaries of implementation status,
clarification of front
and back
in the weak proposal,
clarification of the requirements on c_str
and data
,
and adjunct proposals for element and data access.
The current definition of basic_string
allows only very limited concurrent access to strings.
Such limited concurrency
will inhibit performance in multi-threaded applications.
We categorize string operations as follows:
These operations must be single-threaded and define the temporal bounds of all other operations on an object. However, copy construction may induce a shared representation on the argument string in copy-on-write implementations.
These operations will (probably) change the state of a string variable regardless of the implementation. Furthermore, some may introduce a shared representation on the argument string in copy-on-write implementations.
operator= resize reserve clear operator+= append assign push_back insert erase pop_back replace swap operator<< getline
(Any library function that accepts a string by non-const reference.)
These operations provide pointers or references to the internal string representation.
begin
end
rbegin
rend
operator[]
at
front
back
For non-const strings, iterator and element access operations may require copy-on-write implementation to 'unshare' the string to enable write access to the referenced characters.
Furthermore, the following operations may also require 'unsharing' the string data to provide contiguous null-terminated character arrays.
c_str
data
For const strings, iterators and element access operations do not inhibit concurrent accesses, nor do references through their results.
Iterators and references may be invalidated by:
Since only the first 'unsharing' operation following a 'mutating' must make the string not sharable, further reference or iterator creations cannot invalidate anything.
Thus the only real anomaly seems to be that
creating the initial non-const reference
invalidates previously generated const references.
This anomaly has unfortunate consequences.
Typically, when v
offers "container thread safety", we can do
#pragma omp parallel for for( size_t i = 0, n = v.size(); i < n; ++i ) { v[i] = 0; }
However, for a basic_string v, we must insert
v[0];
before the pragma to render the code correct.
Such a subtlety is difficult to teach and difficult to review.
Similarly, we would expect to be able to write s[0]+s[1]
to
add the first two characters of a string. And indeed this is correct if
either s
is a string
or const string
.
However similar examples become incorrect if only one access to s
is as a const string
, and the other access is through
a non-const reference to the same string. In that case, the second
operator[]
invocation may invalidate the first character
reference before the character value can be obtained.
As these examples illustrate, this issue is not completely new with the introduction of threads, but it is aggravated by it.
There are also performance implications to the current design. In a copy-on-write implementation,
const string empty(""); vector<string> v; #pragma omp parallel for for( size_t i = 0, n = v.size(); i < n; ++i ) { v[i] = empty; }
may run very slowly, since each iteration writes to the representation of empty, and thus is likely to generate a conflict cache miss. This issue may be secondary, but it argues that it is really hard to write efficient code if you do not know whether you have a copy-on-write implementation or not.
Any operation that may change the size of a string, or any standard library function that accepts a string by non-const reference, may potentially write to the string representation and thus is not suitable for concurrent operation.
The goal is that any operation that returns an internal reference or pointer can be called concurrently safely, but any write through the reference or pointer is not safe. However, this is not possible with a shared-buffer implementation, such as the classic reference counted variant.
Change c_str
and data
to not invalidate iterators and references.
We chose to leave swap
as an invalidating operation
to enable the small string optimization.
This change effectively requires null-terminated buffers. [Note: We think the wording is sufficient, but need confirmation.] For implementations that inline critical operations in user code, this change may affect the Application Binary Interface (ABI). The following table indicates whether or not the implementations null-terminate buffers. Entries marked with a '?' are "best guess" and not confirmed. In some cases, it has not been verified that entries apply to all libraries shipped by a vendor. No vendor is known to have a problem with the weak proposal.
compiler | library | '98 | '03 | '0x |
---|---|---|---|---|
Apple | libstdc++ | yes | yes | yes |
CodeWarrior | CodeWarrior | yes | yes | yes? |
Comeau | Dinkumware | yes | yes | yes |
Comeau | STLPort | yes? | yes? | yes? |
GNU | libstdc++ | yes? | yes | yes? |
HP | RW | yes | yes | yes? |
IBM | Dinkumware | yes | yes | yes |
Intel/MS | Dinkumware | yes | yes | yes |
Intel/Linux | libstdc++ | yes | yes | yes? |
Microsoft | Dinkumware | yes | yes | yes |
Sun | libCstd/RW | yes | yes | yes? |
Sun | STLport | yes | yes | yes? |
In paragraph 4, edit
References, pointers, and iterators referring to the elements of a
basic_string
sequence may be invalidated by the following uses of thatbasic_string
object:
- As an argument to any standard library function taking a reference to non- const
basic_string
as an argument [Footnote: For example as an argument to non-member functionsswap()
(21.3.8.8),operator>>()
(21.3.8.9), andgetline()
(21.3.8.9), or as an argument tobasic_string::swap()
. —end footnote]As an argument to non-member functionsswap()
(21.3.8.8),operator>>()
(21.3.8.9), andgetline()
(21.3.8.9).As an argument tobasic_string::swap()
.Callingdata()
andc_str()
member functions.- Calling non-const member functions, except
operator[]
,at
,front
,back
,begin
,rbegin
,end
, andrend
.- Following construction or any of the above uses, except the forms of
insert
anderase
that return iterators, the first call to non-const member functionsoperator[]
,at
,front
,back
,begin
,rbegin
,end
, orrend
.
Edit paragraphs 1 through 4 as follows.
const charT* c_str() const;
const charT* data() const;
Returns: A pointer to the initial element of an array of length
size() + 1
whose firstsize()
elements equal the corresponding elements of the string controlled by*this
and whose last element is a null character specified bycharT()
.
Requires: The program shall not alter any of the values stored in the array. Nor shall the program treat the returned value as a valid pointer value after any subsequent call to a non-const member function of the classbasic_string
that designates the same object as this.
const charT* data() const;Returns: If
size()
is nonzero, the member returns a pointer to the initial element of an array whose firstsize()
elements equal the corresponding elements of the string controlled by*this
. Ifsize()
is zero, the member returns a non-null pointer that is copyable and can have zero added to it.Requires: The program shall not alter any of the values stored in the character array.
Nor shall the program treat the returned value as a valid pointer value after any subsequent call to a non-const member function ofbasic_string
that designates the same object asthis
.
With the null-termination constraint on strings, it is possible to widen the string interface by allowing element access to the null.
This proposal may change the implementation of some inline string operations. We believe there is no change to existing conforming program behavior.
In paragraph 1, edit
const_reference operator[](size_type pos) const; reference operator[](size_type pos);Requires:pos <= size()
.Returns: Ifpos < size()
, returns*(begin() + pos)
. Otherwise,ifreturns a reference to apos == size()
, theconst
versioncharT()
that shall not be modified.Otherwise, the behavior is undefined.Complexity: forconst
strings, constant time.
To the c_str
and data
requirements, edit
Returns: A pointer to the initial element of an array of lengthsize() + 1
whose firstsize()
elements equal the corresponding elements of the string controlled by*this
and whose last element is a null character specified bycharT()
.Returns: A pointer
p
such thatp+i == &operator[](i)
for eachi
in [0
,size()
].Throws: for
const
strings, nothing.Complexity: for
const
strings, constant time.
In addition to the weak proposal, we propose to make all iterator and element access operations safely concurrently executable.
We are increasing the stability of operations even in sequential code.
This change disallows copy-on-write implementations. For those implementations using copy-on-write implementations, this change would also change the Application Binary Interface (ABI). The following table indicates whether or not the implementations use copy-on-write. Entries marked with a '?' are "best guess" and not confirmed. In some cases, it has not been verified that entries apply to all libraries shipped by a vendor. Only one vendor is known to have a problem with the strong proposal.
compiler | library | '98 | '03 | '0x |
---|---|---|---|---|
Apple | libstdc++ | yes | yes | no |
CodeWarrior | CodeWarrior | yes | no | no? |
Comeau | Dinkumware | yes | no | no |
Comeau | STLPort | no? | no? | no? |
GNU | libstdc++ | yes | yes | no |
HP | RW | yes | yes | yes? |
IBM | Dinkumware | 1 thread? | 1 thread | no |
Intel/MS | Dinkumware | yes | no | no |
Intel/Linux | libstdc++ | yes | yes | no |
Microsoft | Dinkumware | yes | no | no |
Sun | libCstd/RW | yes | yes | no |
Sun | STLport | no | no | no |
In paragraph 4, edit
References, pointers, and iterators referring to the elements of a
basic_string
sequence may be invalidated by the following uses of thatbasic_string
object:
- As an argument to any standard library function taking a reference to non- const
basic_string
as an argument [Footnote: For example as an argument to non-member functionsswap()
(21.3.8.8),operator>>()
(21.3.8.9), andgetline()
(21.3.8.9), or as an argument tobasic_string::swap()
. —end footnote]As an argument to non-member functionsswap()
(21.3.8.8),operator>>()
(21.3.8.9), andgetline()
(21.3.8.9).As an argument tobasic_string::swap()
.Callingdata()
andc_str()
member functions.- Calling non-const member functions, except
operator[]
,at
,front
,back
,begin
,rbegin
,end
, andrend
.Following construction or any of the above uses, except the forms ofinsert
anderase
that return iterators, the first call to non-const member functionsoperator[]
,at
,begin
,rbegin
,end
, orrend
.
Delete paragraph 5.
[Note: These rules are formulated to allow, but not require, a reference counted implementation. A reference counted implementation must have the same semantics as a non-reference counted implementation. [ Example:string s1("abc"); string::iterator i = s1.begin(); string s2 = s1; *i = 'a'; // Must modify only s1
—end example] —end note]
Edit paragraphs 1 through 4 as follows.
const charT* c_str() const;
const charT* data() const;
Returns: A pointer to the initial element of an array of length
size() + 1
whose firstsize()
elements equal the corresponding elements of the string controlled by*this
and whose last element is a null character specified bycharT()
.
Requires: The program shall not alter any of the values stored in the array. Nor shall the program treat the returned value as a valid pointer value after any subsequent call to a non-const member function of the classbasic_string
that designates the same object as this.
const charT* data() const;Returns: If
size()
is nonzero, the member returns a pointer to the initial element of an array whose firstsize()
elements equal the corresponding elements of the string controlled by*this
. Ifsize()
is zero, the member returns a non-null pointer that is copyable and can have zero added to it.Requires: The program shall not alter any of the values stored in the character array.
Nor shall the program treat the returned value as a valid pointer value after any subsequent call to a non-const member function ofbasic_string
that designates the same object asthis
.
With the null-termination constraint on strings, it is possible to widen the string interface by allowing element access to the null.
The difference between the adjunct weak proposal
and the adjunct strong proposal is extending the
throws and complexity requirements
on const
strings to non-const
strings.
This proposal may change the implementation of some inline string operations. For implementations without copy-on-write, we believe there is no change to existing conforming program behavior. Implementions with copy-on-write, would have more significant changes.
In paragraph 1, edit
const_reference operator[](size_type pos) const; reference operator[](size_type pos);Requires:pos <= size()
.Returns: Ifpos < size()
, returns*(begin() + pos)
. Otherwise,ifreturns a reference to apos == size()
, theconst
versioncharT()
that shall not be modified.Otherwise, the behavior is undefined.Complexity: constant time.
To the c_str
and data
requirements, edit
Returns: A pointer to the initial element of an array of lengthsize() + 1
whose firstsize()
elements equal the corresponding elements of the string controlled by*this
and whose last element is a null character specified bycharT()
.Returns: A pointer
p
such thatp+i == &operator[](i)
for eachi
in [0
,size()
].Throws: nothing.
Complexity: constant time.
With the null-termination access for operator[]
,
as provided in the adjunct proposals,
it is also possible to extend that same access
to the member function at
.
s.at(s.size())
.
Edit paragraphs 2 through 4 as follows.
const_reference at(size_type pos) const; reference at(size_type pos);Requires:pos
<<= size()Throws:out_of_range
ifpos
.>=> size()Returns:operator[](pos)
.
The largest potential loss in performance due to a switch away from copy-on-write implementations is the increased consumption of memory for applications with very large read-mostly strings. However, we believe that for those applications ropes are a better technical solution, and recommend a rope proposal be considered for inclusion in Library TR2.