Thread-Local Storage

ISO/IEC JTC1 SC22 WG21 N2659 = 08-0169 - 2008-06-11

Lawrence Crowl, [email protected], [email protected]

This proposal is a revision of N2545 = 08-0055 - 2008-03-16. The revision consists of wording changes arising from core language subcommittee review.

Introduction

In multi-threaded applications, there often arises the need to maintain data that is unique to a thread. We call this thread-local storage.

Several techniques have been used to accomplish this task. Notable among them is the POSIX getthreadspecific and setthreadspecific facility. Unfortunately, this facility is clumsy and slow. In addition, the facility is not particularly helpful when converting a single-threaded application to a multi-threaded application.

Several vendors have provided a language extension for a new storage class that indicates that a variable has thread storage duration. Use of thread variables is relatively easy and access to thread variables is relatively fast. In addition, the conversion of a single-threaded application using static-duration variables to a multi-threaded application using thread-duration variables requires less wholesale program restructuring.

Roughly equivalent extensions are available from

GNU Thread-Local Storage
HP Using Thread Local Storage
HP Tru64 UNIX to HP-UX STK: critical Impact: TLS - feature differences (CrCh320)
IBM Thread-Local Storage in What's New in XL C/C++ V9.0
IBM The __thread storage class specifier
Intel Thread-local Storage
Microsoft Thread Local Storage
Sun Thread-Local Storage

The C++ standard should adopt existing practice for thread-local storage. In addition, the C++ standard should extend existing practice to enable broader use.

Proposal

The specification outline is as follows. We defer detailed changes to the text of the standard to the final section.

Thread Storage Duration

Add a new storage duration called thread storage duration. Objects with thread storage duration are unique to each thread.

Those objects that may have static storage duration may have thread storage duration instead. These objects include namespace-scope variables, function-local static variables, and class static member variables.

Storage Class thread_local

Add thread_local, a new keyword and storage class specifier. The thread_local specifier indicates that the variable has thread storage duration.

Variables declared with the thread_local specifier are bound as they would be without the thread_local specifier.

Addresses of Thread Variable

The address-of operator (&), when applied to a thread variable, is evaluated at run time and returns the address of the current thread's variable. Therefore, the address of a thread variable is not a constant.

Thread-local storage defines lifetime and scope, not accessibility. That is, one may take the address of a thread-local variable and pass it to other threads.

The address of a thread variable is stable for the lifetime of the corresponding thread. The address of a thread variable may be freely used during the variable's lifetime by any thread in the program. When a thread terminates, all addresses of that thread's variables are invalid and may not be used.

Thread Variable Dynamic Initialization

A thread variable may be statically initialized as would any other static-duration variable.

At present, all implementations of thread-local storage do not support dynamic initalization (and presumably non-trivial destructors). There was mild consensus at the Mont Treblant meeting to support dynamic initialization of function-local, thread-local variables. The intialization of such variables is already guarded and synchronous, so new technology is not required. On the other hand, the implementation for dynamic initialization of namespace-scope variables is much more difficult, and may require additional linker and operating system support. There was no consensus to support dynamic initialization of namespace-scope variables at that time. However, interviews with prospective users indicated a firm desire for full dynamic initialization of thread storage duration variables. The programmers simply did not want to partition their types this way.

The implementation of dynamic initialization and destruction can be implemented with two approaches.

.init sections
Extend the semantics of .init sections to also include sections for thread-local storage. These thread-local inits will be invoked whenever the corresponding storage section is allocated. This approach requires operating-system support.
initialized flags
The compiler inserts dynamic tests on an initialized flag into the program before access to a thread-local variable. The initialization of a thread-local variable must initialize all such variables defined within its translation unit. Note though, that initializations should be marked complete before executing the initialization to prevent recursive attempts to initialize the same variable. (Such recursive initializations have undefined behavior and are governed by the zero-initialization clause.) This approach does not require operating-system support, but has higher run-time cost.

In either case, the initialization of a thread-local variable must place the destruction on a thread-local list for subsequent handling on exit from the thread (potentially with cancellation cleanup functions).

Other Issues

There are some other issues that deserve mention even though they are not properly part of the C++ standard because they affect real programs.

Dynamic Libraries

The allocation of thread-local storage for the full product of threads and dynamic libraries could result in very large storage requirements. The Sun Microsystems implementation only allocates thread-local storage for a dynamic library when the thread uses a variable from that library. That is, the Sun implementation allocates memory lazily for each thread and dynamic library pair. To avoid bloated programs, the language definition must permit this optimization.

The system may immediately deallocate the storage associated with a thread and dynamic library pair when either the thread terminates or the library is closed. The system is not required to deallocate immediately. However, the system is required to not leak storage. Thread-local storage for a thread must be reclaimed no later than a subsequent thread creation. Thread-local storage for a library within a thread must be reclaimed no later than a subsequent open of that library. (Opening another library does not require storage reclamation, though doing so would certainly reduce storage consumption.)

While storage deallocation can be defered, variable destruction must not be defered because destruction depends on access to thread state. In the presence of programmed closing of a dynamic library, its thread-local variables may need to be destroyed out of order with respect to thread-local variables outside of the library.

System Interface

When dlsym() is used on a thread variable, the address returned will be the address of the currently executing thread's variable.

Standard Changes

The text of the standard changes as specified in this section.

2.11 Keywords [lex.key]

To table 3, add thread_local.

3.6.1 Main function [basic.start.main]

In paragraph 4, edit as follows. This change is the minimal necessary to accommodate thread-duration objects. A more robust specification of termination is needed. See 18.4 Start and termination [support.start.term].

Calling the function std::exit(int) declared in <cstdlib> (18.4) terminates the program without leaving the current block and hence without destroying any objects with automatic storage duration (12.4). If std::exit is called to end a program during the destruction of an object with static or thread storage duration, the program has undefined behavior.

3.6.2 Initialization of non-local objects [basic.start.init]

Before paragraph 1, add a new paragraph

There are two broad classes of named non-local objects, those with static storage duration (3.7.1) and those with thread storage duration (3.7.2(new)). Non-local objects with static storage duration are initialized as a consequence of program initiation. Non-local objects with thread storage duration are initialized as a consequence of thread execution. Within each of these phases of initiation, initialization occurs as follows.

In paragraph 1, edit

Objects with static storage duration (3.7.1) or thread storage duration (3.7.2(new)) shall be zero-initialized (8.5) before any other initialization takes place. A reference with static or thread storage duration and an object of trivial or literal type with static or thread storage duration can be initialized with a constant expression (5.19); this is called constant initialization. ....

In paragraph 2, edit

An implementation is permitted to perform the initialization of an object of namespace scope with static storage duration as a static initialization even if such initialization is not required to be done statically, provided that

In paragraph 3, edit

It is implementation-defined whether or not the dynamic initialization (8.5, 9.4, 12.1, 12.6.1) of an object of namespace scope with static storage duration is done before the first statement of main. ....

After paragraph 3, add new paragraph 4.

It is implementation-defined whether or not the dynamic initialization (8.5, 9.4, 12.1, 12.6.1) of an object of namespace scope and with thread storage duration is done before the first statement of the initial function of the thread. If the initialization is deferred to some point in time after the first statement of the initial function of the thread, it shall occur before the first use of any object with thread storage duration defined in the same translation unit as the object to be initialized.

In the existing paragraph 4, edit

If construction or destruction of a non-local static or thread duration object ends in throwing an uncaught exception, the result is a call to std::terminate (18.7.3.3).

3.6.3 Termination [basic.start.term]

In paragraph 1, edit

Destructors (12.4) for initialized objects of static storage duration (declared at block scope or at namespace scope) are called as a result of returning from main and as a result of calling std::exit (18.3). Destructors (12.4) for initialized objects with thread storage duration (declared at block scope or at namespace scope) within a given thread are called as a result of that thread calling std::exit or returning from the initial function of the thread. Objects with thread storage duration are destroyed before those of static storage duration. Otherwise, these These objects are destroyed in the reverse order of the completion of their constructor or of the completion of their dynamic initialization. If an object is initialized statically, the object is destroyed in the same order as if the object was dynamically initialized. For an object of array or class type, all subobjects of that object are destroyed before any local object with static storage duration initialized during the construction of the subobjects is destroyed.

In paragraph 2, edit

If a function contains a local object of static or thread storage duration that has been destroyed and the function is called during the destruction of an object with static or thread storage duration, the program has undefined behavior if the flow of control passes through the definition of the previously destroyed local object.

In paragraph 4, implicitly adding thread duration, edit

Calling the function std::abort() declared in <cstdlib> terminates the program without executing any destructors for objects of automatic or static storage duration and without calling the functions passed to std::atexit() or std::at_quick_exit().

3.7 Storage Duration [basic.stc]

To the list of storage durations in paragraph 1, between static and automatic, add

In paragraph 2, edit

Static, thread, and automatic storage durations are associated with objects introduced by declarations (3.1) and implicitly created by the implementation (12.2). The dynamic storage duration is associated with objects created with operator new (5.3.4).

3.7.1 Static storage duration [basic.stc.static]

In paragraph 1, edit

All objects which neither do not have dynamic storage duration, do not have thread storage duration, and nor are not local, have static storage duration. The storage for these objects shall last for the duration of the program (3.6.2, 3.6.3).

3.7.2(new) Thread storage duration [basic.stc.thread]

Add a new section after 3.7.1 Static storage duration [basic.stc.static] with the following contents.

All objects or references declared with the thread_local keyword have thread storage duration. The storage for these objects or references shall last for the duration of the thread in which they are created. There is a distinct object or reference per thread, and use of the declared name refers to the object or reference associated with the current thread.

An object or reference with thread storage duration shall be initialized before its first use, and if constructed, shall be destroyed on thread exit.

3.7.3.1(old) Allocation functions [basic.stc.dynamic.allocation]

In paragraph 4, edit

[Note: in particular, a global allocation function is not called to allocate storage for objects with static storage duration (3.7.1), for objects or references with thread storage duration (3.7.2(new)), for objects of type std::type_info (5.2.8), for the copy of an object thrown by a throw expression (15.1). —end note]

This restriction says that allocation of storage for thread-duration variables does not go through the global operator new functions. This restriction is necesssary to enable link-time preallocation.

3.8 Object Lifetime [basic.life]

In paragraph 8, edit

If a program ends the lifetime of an object of type T with static (3.7.1), thread (3.7.2(new), or automatic (3.7.2)(3.7.3(new)) storage duration and if T has a non-trivial destructor, ....

In footnote 35, edit

that is, an object for which a destructor will be called implicitly — either either upon exit from the block for an object with automatic storage duration, upon exit from the thread for an object with thread storage duration, or upon exit from the program for an object with static storage duration.

In paragraph 9, edit

Creating a new object at the storage location that a const object with static, thread, or automatic storage duration occupies or, at the storage location that such a const object used to occupy before its lifetime ended results in undefined behavior.

6.7 Declaration statement [stmt.dcl]

Within paragraph 4, edit

The zero-initialization (8.5) of all local objects with static storage duration (3.7.1) or thread storage duration (3.7.2(new)) is performed before any other initialization takes place. A local object of trivial or literal type (3.9) with static or thread storage duration initialized with constant-expressions is initialized before its block is first entered. An implementation is permitted to perform early initialization of other local objects with static or thread storage duration under the same conditions that an implementation is permitted to statically initialize an object with static or thread storage duration in namespace scope (3.6.2).

In paragraph 5, edit

The destructor for a local object with static or thread storage duration will be executed if and only if the variable was constructed. [Note: 3.6.3 describes the order in which local objects with static or thread storage duration are destroyed. —end note]

7.1.1 Storage class specifiers [dcl.stc]

In paragraph 1, add "thread_local" to the list of storage class specifiers.

In paragraph 1, edit

At most one storage-class-specifier shall appear in a given decl-specifier-seq., except that thread_local may appear with static or extern. If thread_local appears in any declaration of an object or reference, it shall be present in all declarations of that object.

After paragraph 3, add a new paragraph

The thread_local specifier can be applied only to the names of objects or references of block scope that also specify static or to the names of objects or references of namespace scope. It specifies that the named object or reference has thread storage duration (3.7.2(new)).

In paragraph 4, edit

A static specifier used in the declaration of an object declares the object to have static storage duration (3.7.1), unless accompanied by the thread_local specifier, which declares the object or reference to have thread storage duration (3.7.2(new))

8.3 Meaning of declarators [dcl.meaning]

In paragraph 2 edit as follows.

A static, thread_local, extern, register, mutable, friend, inline, virtual, or typedef specifier applies directly to each declarator-id in an init-declarator-list; the type specified for each declarator-id depends on both the decl-specifier-seq and its declarator.

8.5 Initializers [dcl.init]

In paragraph 2, edit

Automatic, register, thread, static, and namespace-scoped external variables of namespace scope can be initialized by arbitrary expressions involving literals and previously declared variables and functions.

Paragraph 7 remains unchanged, which implies that thread storage duration objects may be uninitialized at program startup.

8.5.1 Aggregates [decl.init.aggr]

In paragraph 14, edit as follows. The expanded scope of 3.6.2 leaves this text mostly untouched.

When an aggregate with static or thread storage duration is initialized with a brace-enclosed initializer-list, if all the member initializer expressions are constant expressions, and the aggregate is a trivial type, the initialization shall be done during the static phase of initialization (3.6.2); otherwise, it is unspecified whether the initialization of members with constant expressions takes place during the static phase or during the dynamic phase of initialization.

9.2 Class members [class.mem]

In paragraph 6, edit

A member shall not be declared with the extern or register storage-class-specifier. Within a class definition, a member shall not be declared with the thread_local storage-class-specifier unless also declared static.

9.4.2 Static data members [class.static.data]

In paragraph 1, edit

A static data member is not part of the subobjects of a class. For such a member declared thread_local, there is one copy of the member per thread. For such a member not declared thread_local, there There is only one copy of a static the data member shared by all the objects of the class.

12.1 Constructors [class.ctor]

In paragraph 8, edit

Default constructors are called implicitly to create class objects of static, thread, or automatic storage duration (3.7.1, 3.7.2(new), 3.7.3(new)) defined without an initializer (8.5), ...

12.2 Temporary objects [class.temporary]

In paragraph 5, edit

In addition, the destruction of temporaries bound to references shall take into account the ordering of destruction of objects with static, thread, or automatic storage duration (3.7.1, 3.7.2(new), 3.7.3(new));

12.4 Destructors [class.dtor]

In paragraph 9, edit

Destructors are invoked implicitly (1) for a constructed object with static storage duration (3.7.1) at program termination (3.6.3), (2) for a constructed object with thread storage duration (3.7.2(new)) at thread exit, (23) for a constructed object with automatic storage duration (3.7.23(new)) when the block in which the object is created exits (6.7), (34) for a constructed temporary object when the lifetime of the temporary object ends (12.2), (45) for a constructed object allocated by a new-expression (5.3.4), through use of a delete-expression (5.3.5), (56) in several situations due to the handling of exceptions (15.3).

12.6.1 Explicit initialization [class.expl.init]

In paragraph 4, edit

[ Note: the order in which objects with static or thread storage duration are initialized is described in 3.6.2 and 6.7. —end note ]

15.3 Handling an exception [except.handle]

In paragraph 13, edit

Exceptions thrown in destructors of objects with static storage duration or in constructors of static-duration namespace-scope objects are not caught by a function-try-block on main(). Exceptions thrown in destructors of objects with thread storage duration or in constructors of thread-duration namespace-scope objects are not caught by a function-try-block on the initial function of the thread.

15.5.1 The std::terminate() function [except.terminate]

In paragraph 1, in the list of causes for termination, edit

when construction or destruction of a non-local object with static or thread storage duration exits using an exception (3.6.2), or

Another possibility is to propogate the exception to the joiner, but then there would be no distinction between the thread function exiting with an exception and one of its thread-duration objects exiting with an exception.

18.4 Start and termination [support.start.term]

In paragraph 3, edit

The function abort() has additional behavior in this International Standard:

Paragraph 7, discusses the interaction of destruction and calling exit. The following edit is the minimum possible change to the standard to occomodate thread storage duration objects.

The function exit() has additional behavior in this International Standard: