Author:             Fernando Cacciola
Contact:            [email protected]
Organization:    SciSoft
Date:                 2005-08-29
Number:            N1878=05-0138

A proposal to add an utility class to represent optional objects (Revision 1)

Motivation and Scope

A. Concepts

The fundamental data unit in C++ is the object, and according to the C++ Object Model (1.8), an object is a region of storage, that is, a (possibly fragmented) block of memory. All objects have an object representation which is the sequence of unsigned bytes taken up by the object and a value representation which is the set of bits in the object representation that determines the value of the object (3.9/4). The standard loosely defines a value as an implementation-defined discrete element of a set of values, and an object type (39.9/9) as a name that specifies such a value set and the valid operations on it. A variable is an entity (3.3) introduced by the declaration of an object and it's name denotes the object (3.4). In the standard text,  variable and object are interchangeable terms (though in theory a variable can be considered to hold an object).

Objects are created (3.1,5.3.4,12.2) , but unlike other languages, the value of an object is not always given automatically by the implementation. Various initialization rules determine-based on the storage duration, the type, and the presence or absence of an intializer- if and how objects and subobjects are initialized; that is, are given an initial value (3.6.2,8.5,12.1,12.6,12.8). Since these rules are based on lexical properties of a program, the property of an object of being fully or partially initialized (or not) is static, not dynamic, and is independent of any object or value representation used by the implementation in the cases when, for instance, there is no initializer. That is, the static initialization state of an object or variable is determined by its definition and is independent of the actual (possibly garbage) value of the object. Consequently, objects are (statically) initialized or uninitialized based on how they are created regardless of any initial value, well defined or indeterminate.
 [1. this definition of a static initialization state is important because the general concept of a uninitialized variable is often debated as not being well defined because the concept is sometimes defined in terms of the initial value given by the language implementation to variables defined without initializers]

Ultimately, a program carries on a computation and a computation is a transformation of values; they are produced and consumed.

From the point of view of the producer, a value can be unset: purposely set to 0 or any other special value with that meaning; or incomputable: dependable on other values which are invalid, missing, corrupted, etc. A NULL passed as the argument to filebuf::setuf() is an example of an unset value, while the square root of a negative number, the area of an open figure, the result of dereferencing a past the end iterator, the next char in an exhausted stream and a field from an unreachable remote database are examples of incomputable values.

From the point of view of the consumer, a value can be invalid or missing. A 0 divisor and an out of range index are examples of invalid values and a value considered incomputable from the producer point of view is missing from the consumer point of view.

An unset or incomputable value such that a particular consumer can unequivocally interpret as missing is called absent. Absent values are distinctive values whose only purpose is to express in a testable way the fact that a conceptual value is really unset or incomputable. 

In software design, absent values are fundamentally important: If a requested value is incomputable because of an invalid value being used in the request, such as a zero divisor or a negative argument to sqrt(), it is common and reasonable to allow the producer to just fail because the consumer can detect by itself that the request is invalid. However, when the requested value is unset or incomputable for reasons not directly related to the input values used in a request, like when requesting for the field of an unreachable database or the next char in an empty stream, or a point guaranteed to be inside a degenerate figure whose area is 0, it is impractical and unnecessary to allow the producer to fail requiring the consumer to always check first or afterwards (for a solid design). A much more effective and safer technique is to use absent values.

A type which one way or another is capable of distinctively expressing an absent value is called optional or nullable. C/C++ pointers for example are nullable types.

B. Current ad-hoc techniques for expressing absent values

In C++, it is simply impractical to formally define as absent all uninitialized objects because not every type in the system includes a special null value; only pointers do that. Consequently, absent values are traditionally expressed in C++ in one of 3 ways:

  1. Using ad-hoc special values, like EOF.
  2. Using ad-hoc additional flags, like pair<T,bool>.
  3. Using pointers even if the object would have been passed or returned by-value if it wouldn't be possibly absent.

Techniques (1) and (2) suffer from two important problems:

Technique (3), OTOH, uses a null pointer value (4.10) and so it solves the two problems mentioned before: is uniform because users can unequivocally test if the value is indeed absent and being formally undefined behavior to access an object pointed to by a null pointer, an implementation if free to incorporate into the program mechanisms to diagnose or handle (in any particular way) such violations. However, using pointers to represent absent values has many limitations:

The following paper proposes a library solution to the problem of effectively representing absent values of any type: the template class optional<T>. This rather simple class contains within its own storage any object and keeps explicit track of its initialization state. A C++ program can use a default-constructed instance of optional<T> to represent an absent value of type T. Unlike a pointer, optional<T> can be used to return local objects and in non-pointer member subobjects so it can represent absent values of any type (like pointers) but also in any context.

The proposed solution is simple to understand, simple to use and it allows a program to much better express the presence of absent values and its role in the program design. It even encourages the usage of absent values. Therefore, the proposed addition is targeted to all C++ programmers of all levels and for all types of programs.

Explicit support for absent values is common in other languages: In Haskell, for example, the Maybe built-in constructor logically extends a type adding 'Nothing' to it which allows a haskell program to represent absent values of any type. In most purely object-oriented or scripting languages, uninitialized variables (even of built-in type) are automatically initialized with a distinctive value:  undef in Perl, None in Phyton, nil in Smalltalk, Void in Effiel, Null in Java, etc...so programs written in these languages can uniquely express absent values through uninitialized variables; and recently, nullable value types were introduced into NET 2.0 were uninitialized variables of a nullable types are automatically initialized with null (with direct language support for it).

The Boost.Optional library is a reference implementation (with some added functionality not included in this proposal) that was introduced in boost 1.30.0, around March 2003, and which has been widely accepted, used and even occasionally recommended ever since.

Impact On the Standard

This is purely an extension which only adds an utility class to the library.

Design Decisions

Value wrappers explicitly tracking the initialization state for the very purpose of representing absent values have been invented over and over, both publicly and privately. Perhaps the first published such wrapper for C++ is the Fallible<> type in "Scientific and Engineering C++, by Barton and Nackman".

These wrappers all share in common the fact that they contain values in their own storage allowing local absent values to be returned and non-pointer absent member subobjects to be represented.

The class proposed for standardization in this paper, however, has been designed with the following unique added goals in mind:

  1. It shall permit the representation of absent values of class type with no default constructor.
  2. It shall permit the representation of absent values of reference type.
  3. It shall benefit from the traditional and well-established idioms that have been used to represent absent objects, thus helping the programmer avoid making mistakes arising from the fact that the value is possibly absent and accessing it is undefined behavior.

Goals 1 and 2 are usability features which have been shown to be radically important during the early stages of the development of the Boost implementation. These features require template metaprogramming techniques and/or compiler magic but the Boost implementation proved this to be entirely possible in a large base of current compilers (the Boost version for example works in all the compilers supported by the Boost project)

Goal 2 conduced to a design decision which was largely debated on the Boost forums:

Optional references rebind when assigned, that is:

	int  a = 123 ;
	int& ra = a ; // ra is bound to a
	optional<int&> ora = ra ; // ora is bound to a 
        int b = 456 ;
	int& rb = b ;
	ora = rb ; // ora is re-bound to rb. 

This is in sharp contrast with normal C++ references.

This decision was made to allow the definition of assignment to be independent of the prior initialization state of the lvalue. If the lvalue optional reference is uninitialized before the assignment there is no choice but to rebind.

	int  a = 123 ;
	int& ra = a ; // ra is bound to a
	optional<int&> ora = ra ; // ora is bound to a 
        int b = 456 ;
	int& rb = b ;
	ora = nullptr ; // explicitly reset to uninitialized to make the point even more clear.
	ora = rb ; // ora MUST bind to rb, there is no choice here. 

Assignment to an initialized optional reference could be made to assign the referenced value, following the behavior of normal C++ references, but then the semantics of assignment would depend on the prior initialization state of the lvalue. The decision, albeit arbitrary, is to rebind.

Goal 3 conduced to another design decision which was also largely debated on the Boost forums:

Access to the contained value is only supported via operators * and ->

The reason for this interface design is that pointers and a null pointer value have been used to represent absent objects from the very first days of C. This technique is so extensively used that the expressions used to access the pointed object, (*p) and p->, have an extraordinary power at expressing optionally: all by themselves, from syntax alone, they make loudly clear that the value being accessed might be absent.

On the other hand, a T* (or a wrapper of it) and an optional<T> are different in nature as the former points to an external object while the later contains an internal object.

In the boost forums it has been argued that users can (and will) confuse optional<T> as a form of wrapper for T*. That is quite possible yet the confusion is harmless: it turns out that any valid expression for a wrapper of T* (say shared_ptr<T>) which is also valid for optional<T> happens the have the same semantics, with the only exception of copying optional<T> or smart_ptr<T> objects which in the former is a deep copy and in the later a shallow copy.

In other words, these expressions:

p->fmember();
p->datamember ;
(*p).fmember();
(*p).datamember ;
T lvalue = *p ;
if ( p ) ...

have the same semantics whether p is optional<T> or smart_ptr<T>.

Assignment of values to an optional<T> object is an operation that doesn't depend on the prior value being absent or not. In fact, it shall succeed even if the lvalue optional<T> was uninitialized before the assignment. For that reason, value assignment of an optional is directly supported via the assignment operator. That is:

optional<int> o = 1; 
o = 2 ; // direct-assignment of the vale
*o = 3 ; // indirect-assignment of the value, OK here because 'o' is uninitialized
o = nullptr ; // explicit de-initialization
*o = 5 ; // ERROR, undefined behavior because *o is absent
o = 6 ; // OK. direct-initialization is always well-defined

Proposed Text for the Standard

A. Add <optional> to Table 11 in 17.4.1.2/1

B. Change Table 27 in 20.0 adding the row:

    20.6    Optional types    <optional>

C. Add the following subclauses to clause 20

20.6 Optional objects                                                                     [lib.optional.synopsis]

1.Header <optional> synopsis

namespace std {

template<class T> class optional ;

template<class T> inline bool operator == ( optional<T> const& x, optional<T> const& y ) ;

template<class T> inline bool operator != ( optional<T> const& x, optional<T> const& y ) ;

template<class T> inline bool operator <  ( optional<T> const& x, optional<T> const& y ) ;

template<class T> inline bool operator >  ( optional<T> const& x, optional<T> const& y ) ;

template<class T> inline bool operator <= ( optional<T> const& x, optional<T> const& y ) ;

template<class T> inline bool operator >= ( optional<T> const& x, optional<T> const& y ) ;

template<class T> inline bool operator == ( optional<T> const& x, decltype(nullptr) y ) ;

template<class T> inline bool operator != ( optional<T> const& x, decltype(nullptr) y ) ;

template<class T> inline bool operator == ( decltype(nullptr) x, optional<T> const& y ) ;

template<class T> inline bool operator != ( decltype(nullptr) x, optional<T> const& y ) ;

template<class T> inline T const& get ( optional<T> const& opt ) ;

template<class T> inline T&       get ( optional<T> & opt ) ;

template<class T> inline T const* get ( optional<T> const* opt ) ;

template<class T> inline T*       get ( optional<T>* opt ) ;

template<class T> inline T const* get_pointer ( optional<T> const& opt ) ;

template<class T> inline T*       get_pointer ( optional<T> & opt ) ;

template<class T> inline void swap( optional<T>& x, optional<T>& y ) ;

} // namespace std

20.6.1 Explicit initialization state                                                            [lib.optional.state]

1.An instance of optional<T> is said to be uninitialized if it has been default constructed, assigned nullptr1, copy-constructed from or assigned with an uninitialized optional<U> instance (with U equal or not to T).

2.An instance of optional<T> is said to be initialized if it has been constructed with any value of type T, assigned a value of type T, copy-constructed from or assigned with an initialized optional<U> instance (with U equal or not to T).

3.Instances of optional<T> shall explicitly track whether they are initialized or not, though the mechanism for doing this is unspecified2.

20.6.2 Contained value                                                                 [lib.optional.value]

Initialized instances of optional<T> shall store a value of type T within its own storage. This value is referred to as the contained value of the optional<T> instance. Implementations are not permitted to use additional storage, such as dynamic memory, to allocate its contained value. As required by the language, the contained value must be allocated in a region of the optional<T> storage suitably aligned for the type T. The mechanism used to allocate the contained value is unspecified3. Whether an uninitialized optional<T> must store some form of contained value and how at that is also unspecified.

If T is of reference type, an implementation is free to store a value of a type other than T (such as a reference wrapper4) provided any operation on the contained value behaves as if it were of type T.

20.6.3 Class template optional [lib.optional]

namespace std {

template<class T>
class optional
{
  public :

    optional () ;

    optional ( decltype(nullptr)1 ) ;

    optional ( T const& v ) ;

    optional ( optional const& rhs ) ;

    template<class U> explicit optional ( optional<U> const& rhs ) ;

    optional& operator = ( decltype(nullptr) ) ;
    optional& operator = ( T const& v ) ;

    optional& operator = ( optional const& rhs ) ;

    template<class U> optional& operator = ( optional<U> const& rhs ) ;

    T const* operator ->() const ;
    T*       operator ->() ;

    T const& operator *() const ;
    T&       operator *() ;

    operator unspecified-bool-type() const ;

    bool operator!() const ;

} ;

} // namespace std

20.6.3.1 Constructors                                                                                [lib.optional.ctor]

optional<T>::optional();

Effects: Default-Constructs an optional.

Postconditions: *this is uninitialized

Remarks: T's default constructor is not called.

optional<T>::optional( delctype(nullptr)1 const& v )

Effects: Directly-constructs an uninitialized optional .

Postconditions: *this is uninitialized.

Remarks: T's default constructor is not called.

optional<T5>::optional( T const& v )			

Effects: Directly-Constructs an optional.

Postconditions: *this is initialized and its contained value is a copy of 'v'.

Throws: Whatever T::T( T const& ) throws.

Remarks: T::T( T const& ) is called to define the contained value.

Exception Safety: Exceptions can only be thrown during T::T( T const& ); in that case, this constructor has no effect.

optional<T&6>::optional( T& ref )				

Effects: Directly-Constructs an optional.

Postconditions: *this is initialized and its contained value is another reference to the same object referenced by ref.

Remarks: both the contained value and ref shall refer to the same object (alias).

optional<T5>::optional( optional<T5> const& rhs );	

Effects: Copy-Constructs an optional.

Postconditions: If rhs is initialized, *this is initialized and its contained value is a copy of the contained value of rhs; else *this is uninitialized.

Throws: Whatever T::T( T const& ) throws.

Remarks: If rhs is initialized, T::T(T const& ) is called to define the contained value..

Exception Safety: Exceptions can only be thrown during T::T( T const& ); in that case, this constructor has no effect.

optional<T&6>::optional( optional<T&6> const& rhs );			

Effects: Copy-Constructs an optional.

Postconditions: If rhs is initialized, *this is initialized and its contained value is another reference to the same object referenced by the contained value of *rhs; else *this is uninitialized.

Remarks: If *this is initialized, the contained values of both *this and rhs shall refer to the same object (they alias).

 template<U5> explicit optional<T5>::optional( optional<U5> const& rhs ); 	

Effects: Copy-Constructs an optional.

Postconditions: If rhs is initialized, *this is initialized and its contained value is a copy of the contained value of rhs converted to type T; else *this is uninitialized.

Throws: Whatever T::T( U const& ) throws.

Remarks: T::T( U const& ) is called to define the contained value if rhs is initialized. If there is not conversion from U to T the program is ill-formed.

Exception Safety: Exceptions can only be thrown during T::T( U const& ); in that case, this constructor has no effect.

 

20.6.3.2 Assignment                                                                                    [lib.optional.assign]

optional<T>& optional<T>::operator= ( decltype(nullptr) null ) ;			

Effects: Resets the optional to uninitialized state

Postconditions: *this is uninitialized.

Remarks: If *this was initialized, T::~T() is called to destroy the contained value.

optional<T5>& optional<T5>::operator= ( T const& rhs ) ;			

Effects: Assigns the value 'rhs' to an optional.

Postconditions: *this is initialized and its contained value is a copy of rhs.

Throws: Whatever T::operator=( T const& ) or T::T(T const&) throws.

Remarks: If *this was initialized, T::operator=( T const& ) is used to define the contained value, otherwise, T::T(T const&) is used for that.

Exception Safety: In the event of an exception, the initialization state of *this is unchanged and its contained value undefined (it is up to T's operator=()) [If *this is initially uninitialized and T's copy constructor throws, *this is left properly uninitialized]

optional<T&6>& optional<T&6>::operator= ( T& const& rhs ) ;			

Effects: (Re)binds the wrapped reference.

Postconditions: *this is initialized and its contained value references the same object referenced by rhs.

Remarks: If *this was initialized, is is rebound to the new object. See 20.6.1.4 for details.

optional<T5>& optional<T5>::operator= ( optional<T5> const& rhs ) ;			

Effects: Assigns another optional to an optional.

Postconditions: If rhs is initialized, *this is initialized and its contained value is a copy of the contained value of rhs; else *this is uninitialized.

Throws: Whatever T::operator( T const&) or  T::T( T const& ) throws.

Remarks: In order to define the contained value: if both *this and rhs are initially initialized, T's assignment operator is used. If *this is initially initialized but rhs is uninitialized, T's destructor is called. If *this is initially uninitialized but rhs is initialized, T's copy constructor is called.

Exception Safety: In the event of an exception, the initialization state of *this is unchanged and the contained value undefined (it is up to T's operator=()) [If *this is initially uninitialized and T's copy constructor throws, *this is left properly uninitialized]

optional<T&6> & optional<T&6>::operator= ( optional<T&6> const& rhs ) ;		

Effects: (Re)binds the wrapped reference.

Postconditions: If *rhs is initialized, *this is initialized and its contained value references the same object referenced by *rhs; otherwise, *this is uninitialized (and references no object).

Remarks: If *this was initialized and so is *rhs, this is rebound to the new object. See 20.6.1.4 for details.

template<U5> optional& optional<T5>::operator= ( optional<U5> const& rhs ) ;		

Effects: Assigns another convertible optional to an optional.

Postconditions: If rhs is initialized, *this is initialized and its contained value is a copy of the contained value of rhs converted to type T; else *this is uninitialized.

Throws: Whatever T::operator=( U const& ) or T::T( U const& ) throws.

Remarks: In order to define the contained value: if both *this and rhs are initially initialized, T's assignment operator (from U) is used. If *this is initially initialized but rhs is uninitialized, T's destructor is called. If *this is initially uninitialized but rhs is initialized, T's converting constructor (from U) is called.

Exception Safety: In the event of an exception, the initialization state of *this is unchanged and its contained value undefined (it is up to T's operator=()) [If *this is initially uninitialized and T's converting constructor fails, *this is left properly uninitialized]

 

20.6.3.3 Observers                                                                                                                    [lib.optional.observers]

T const& optional<T>::operator*() const ;
T&       optional<T>::operator*();		
T const& get ( optional<T> const& ) ;
T&       get ( optional<T> &) ;

Requirements: *this is initialized

Returns: A reference to the contained value.

T const* get ( optional<T5> const* ) ;
T*       get ( optional<T5> *) ;
T const* get_pointer ( optional<T5> const& ) ;
T*       get_pointer ( optional<T5> &) ;

Returns: If *this is initialized, a pointer to the contained value; otherwise NULL.

Remarks: Accessing the returned pointer (which points into the optional<T>) after the optional<T> has been destroyed or assigned nullptr; or deleting it, is undefined behavior.

T const* optional<T5>::operator ->() const ;
T*       optional<T5>::operator ->()       ;

Requirements: *this is initialized.

Returns: A pointer to the contained value.

Remarks: Accessing the returned pointer (which points into the optional<T>) after the optional<T> has been destroyed or assigned nullptr; or deleting it, is undefined behavior.

optional<T>::operator unspecified-type() const ;

Returns: A value of an unspecified type which if used on a boolean context is equivalent to (get(this) != 0)

 bool optional<T>::operator!() ;

Returns: If *this is uninitialized, true; else false.

20.6.3.4 Relational operators                                                                                    [lib.optional.relops]

bool operator == ( optional<T> const& x, optional<T> const& y );

Returns: If both x and y are initialized, (*x == *y). If only x or y is initialized, false. If both are uninitialized, true.

bool operator < ( optional<T> const& x, optional<T> const& y );

Returns: If y is not initialized, false. If y is initialized and x is not initialized, true. If both x and y are initialized, (*x < *y).

bool operator != ( optional<T> const& x, optional<T> const& y );

Returns: !( x == y );

bool operator > ( optional<T> const& x, optional<T> const& y );

Returns: ( y < x );

bool operator <= ( optional<T> const& x, optional<T> const& y );

Returns: !( y<x );

bool operator >= ( optional<T> const& x, optional<T> const& y );

Returns: !( x<y );

bool operator == ( optional<T> const& x, decltype(nullptr) null );
bool operator == ( decltype(nullptr) null, optional<T> const& x );

Returns: true if x is uninitialized, false otherwise.

bool operator != ( optional<T> const& x, decltype(nullptr) null );
bool operator != ( decltype(nullptr) null, optional<T> const& x );

Returns: true if x is not uninitialized, false otherwise.

20.6.3.5 Swap                                                                                                       [lib.optional.swap]

void swap ( optional<T>& x, optional<T>& y );

Effects: If both x and y are initialized, calls swap(*x,*y)1
If only one is initialized, say x, calls: y = x; x = nullptr;
If none is initialized, does nothing.

Postconditions: The states of x and y interchanged.

Throws: If both are initialized, whatever swap(T&,T&) throws. If only one is initialized, whatever T::T ( T const& ) throws.

Remarks: If only one is initialized, T::~T() and T::T( T const& ) are called to defined the corresponding contained values.

Exception Safety: If both are initialized, this operation has the exception safety guarantees of swap(T&,T&).
If only one is initialized, it has the same basic guarantee as optional<T>::operator=( optional<T> const& ).

Footnotes:

  1. decltype(nullptr) and nullptr refers to the special null pointer value proposed in N1488
  2. for example, an implementation can use a bool data member
  3. implementations can use any of the published aligned storage techniques, or use align_as<> from N1546 if available.
  4. such as the reference_wrapper<> proposed in N1453
  5. The semantics specified here correspond only to the case of T not being of reference type.
  6. The semantics specified here correspond only to the case of T being of reference type.

Acknowledgements

All the people in the boost community, particularly those involved in the development of the Boost.Optional library.

References

  1. Scientific and Engineering C++: An Introduction with Advanced Techniques and Examples, Barton and Nackman, Addison Wesley Professional
  2. Boost.Optional library, http://www.boost.org/libs/optional/doc/index.html, Fernando Cacciola
  3. The Meaning of Nothing, http://www.chimu.com/publications/short/stNil.html, Mark Fussel, (C) ChiMu Corporation.
  4. C# Nullable types, http://msdn.microsoft.com/vcsharp/2005/overview/language/nullabletypes, MSDN, Microsoft (R) Visual C# (R) Developer Center