Document number: N1085

Howard E. Hinnant
[email protected]
November 4, 2004

Proposal to augment the interface of malloc/free/realloc/calloc

Introduction

The subject of this proposal is a standardized set of helper functions to support malloc and realloc. The C functions malloc and realloc are not quite primitive enough for truly general purpose work. Given a pointer returned from malloc (or realloc or calloc), the only thing the client knows about the pointer is that it points to a region of heap memory at least as big as the size requested in the malloc call. However it is not uncommon for heap memory systems to slightly over allocate in the interest of meeting alignment requirements or reducing heap fragmentation. Some clients, such as those maintaining dynamic arrays, could make use of additional allocated memory, if they only knew it existed. For example, consider:

typedef struct char_array
{
    char* data;
    size_t size;
    size_t capacity;
} char_array;

This is a simple struct which manages a dynamic array of char with both a logical size and a physical capacity. When the size threatens to exceed capacity, more capacity must be obtained (e.g. realloc). To initialize this array to a specific length, the code might be written today:

void
construct_char_array(char_array* array, size_t size)
{
    array->data = 0;
    array->size = array->capacity = 0;
    if (size)
    {
        array->data = malloc(size);
        if (array->data == 0)
            on_error();
        array->size = array->capacity = size;  // capacity == size
    }
}

However, if the code can query the actual size of the memory block returned from malloc, there is an opportunity to increase the array's capacity, and thus save future increases in size from triggering a reallocation:

void
construct_char_array(char_array* array, size_t size)
{
    array->data = 0;
    array->size = array->capacity = 0;
    if (size)
    {
        array->data = malloc(size);
        if (array->data == 0)
            on_error();
        array->size = size;
        array->capacity = sizeof_alloc(array->data);  // capacity >= size
    }
}

This proposal addresses the needs of the clients implementing dynamic arrays, such as the need addressed above, and more. This proposal also allows clients to discover if a memory block can expand in place by a certain amount, and even more importantly: if not, then by how much?

Imagine a conversation between the dynamic array and the malloc system that goes something like this:

Client: Can you double the size of my memory block without moving it?
malloc: No, but I can expand your 1000 byte block to 1500 bytes. Would you like me to do that?
Client: Sure, I can make use of the extra 500 bytes, thanks.

The dynamic array client may actually need only a few extra bytes, but requested a doubled capacity in order to reduce the need to reallocate the char_array data over the life of the object (geometric growth strategy). But upon finding out that it can procure 500 bytes and still not have to copy its data to a new location, the client is wise to accept the deal, delaying for now (and perhaps even permanently) the need to copy its data to a new location.

This proposal introduces an interface to make negotiations like the one above possible. And although the first impression from those concerned about multithreading may be: the above operation cannot be made atomic! Please be assured that multithreading issues have been taken into account in the design of this interface.

Also taken into account is the ability to layer this interface on to an existing malloc system with no changes to the existing malloc system. The result will not necessarily be useful to the client, but it will behave in a predictable manner which the client will know how to deal with. For example, the hypothetical conversation above could have looked like:

Client: Can you double the size of my memory block without moving it?
malloc: No, I cannot change the size of your 1000 byte memory block.
Client: Ok, thanks anyway. I'll just allocate a new 2000 byte block.

Essentially the interface proposed herein allows the client a much more detailed view into the internals of the heap, while at the same time allowing existing malloc implementations a trivial (if not quite useful) implementation of this interface. It is expected that as this new interface gains favor, vendors can transition from trivial useless implementations to useful implementations with no backwards compatibility issues.

Synopsis

size_t sizeof_alloc(void* ptr);
void*  request_malloc(size_t size_requested, size_t* size_received);
int    resize_alloc(void* ptr, size_t size_requested, size_t* size_received);
int    negotiate_alloc(void* ptr, size_t min_size, size_t preferred_size,
                                                   size_t* size_received);

This interface works with the existing interface and provides extended functionality including:

Except for sizeof_alloc, each of these functions can be trivially implemented in terms of the existing malloc system, if sizeof_alloc also exists. It is anticipated that implementing sizeof_alloc (which simply returns the size of an allocated block of memory) will not present a problem for implementations as this information is required to implement realloc today.

However to implement truly useful functionality, vendors are encouraged to implement the above functions as primitives in the malloc system, in which case the current malloc interface is trivially implemented in terms of these primitives. In either case, clients can write to this interface, and obtain (possibly) more detailed information, thus allowing potentially faster and more efficient use of heap memory.


sizeof_alloc

size_t sizeof_alloc(void* ptr);

This function returns the size of an already allocated memory block.

Precondition:

ptr is non-null and has been returned from request_malloc, malloc, calloc, or realloc, and has not been deallocated (by free or realloc).

Postcondition:

The returned value indicates that the client can write arbitrary information to the range [(char*)ptr, (char*)ptr + returned_value) without fear of corrupting the heap.

Notes:

It is felt that the implementation of this function will not create a significant burden on existing implementations of the malloc system as this information needs to be accessible anyway for the implementation of realloc.

Violations of the precondition could be used to give clients further information, at least in debug builds. For example if a client called sizeof_alloc with a non-heap pointer, the implementation could choose to detect that fact and return 0. However, such behavior is not a requirement of this proposal.


request_malloc

void* request_malloc(size_t size_requested, size_t* size_received);

This function attempts to allocate a block of memory at least as big as size_requested. If size_received is non-null, it places the size of the block actually allocated in *size_received. This call is intended to encapsulate a call to malloc and a call to sizeof_alloc into a single function.

Precondition:

None.

Postcondition:

If size_requested is 0 then 0 is returned. Also in this case if size_received is not null, 0 is stored in *size_received.

If size_requested is greater than 0 then a pointer to a memory block of size at least as large as size_requested bytes is returned if the system can manage it. And in this case if size_received is not null, then the actual size of the memory block will be written to *size_received. This value written to *size_received will be the same value which would be returned to the client if he immediately used the return value in a call to sizeof_alloc.

If size_requested is greater than 0, but the system is not successful in returning a pointer to a block of memory of size_requested bytes, then null is returned. Also in this case, a best guess estimate is written to *size_received (if size_received is non-null) indicating what size_requested value may succeed in a future call to request_malloc. Though this future call is not guaranteed to succeed. It is only a suggestion. Conforming implementations are allowed to always return 0 in *size_received upon failure.

Notes:

request_malloc is intended to be a primitive upon which malloc can be based. Such a malloc might look like:

void*
malloc(size_t size)
{
    if (size == 0)  // optional
        size = 1;   // optional
    return request_malloc(size, 0);
}

However, it is not demanded that request_malloc be a primitive upon which malloc is based. It is possible to implement request_malloc in terms of malloc and sizeof_alloc, but this is suboptimal:

void*
request_malloc(size_t size_requested, size_t* size_received)
{
    void* result = 0;
    if (size_requested)
        result = malloc(size_requested);
    if (size_received)
    {
        if (result)
            *size_received = sizeof_alloc(result);
        else
            *size_received = 0;  // suboptimal
    }
    return result;
}

The downside of this suboptimal implementation is twofold:

  1. It is probably more efficient to have request_malloc directly write to *size_received rather than making two calls into the heap management system (malloc and sizeof_alloc).
  2. If the malloc fails, 0 is written to *size_received in this suboptimal but conforming implementation which really provides no useful information to the client.

The request_malloc function could be used to implement the construct_char_array example instead of malloc:

void
construct_char_array(char_array* array, size_t size)
{
    array->data = 0;
    array->size = array->capacity = 0;
    if (size)
    {
        array->data = request_malloc(size, &array->capacity);
        if (array->data == 0)
            on_error(array->capacity);  // pass suggested size to on_error
        array->size = size;  // capacity >= size
    }
}

In this rewrite, the on_error function has been given the amount of memory which might work in a future request_malloc call. This may be enough information for a more intelligent error recovery. Also note that there is now only a single call into the heap system (compared to two calls with the previous example), which may be significantly faster in a multithreaded environment.


resize_alloc

int resize_alloc(void* ptr, size_t size_requested, size_t* size_received);

This function attempts to expand or shrink a memory block in place to size_requested bytes. If successful, the return value is non-zero, otherwise zero is returned. On success (and if size_received is non-null), *size_received holds the actual size of the memory block. On failure *size_received holds a hint indicating a suggested size_requested that might succeed in a future call.

Precondition:

ptr is non-null and has been returned from request_malloc, malloc, calloc, or realloc, and has not been deallocated (by free or realloc).

Postcondition:

The memory block pointed to by ptr is still valid and has not been altered, aside from being expanded or shrunk. If non-zero has been returned, then the current size of the memory block (as reflected by sizeof_alloc(ptr)) will be greater than or equal to size_requested. Additionally if size_received is non-null, then the result of sizeof_alloc(ptr) will be written to *size_received.

If zero is returned and if size_received is non-null then a suggested size_requested is written to *size_received which may succeed in a future call to resize_alloc.

If size_requested is less than the size of the memory block prior to the call, then the size of the block after the call must be less than the size of the memory block before the call in order for success to be reported.

If size_requested is 0, the pointer will not be deallocated. Instead the system will attempt to shrink the memory block to the smallest size supported by the system.

Notes:

A conforming implementation is allowed to always fail if size_requested is not equal to sizeof_alloc(ptr), and to always write sizeof_alloc(ptr) to *size_received (when size_received is non-null). Indeed, below is a rather useless, yet conforming implementation:

int
resize_alloc(void* ptr, size_t size_requested, size_t* size_received)
{
    size_t orig_size = sizeof_alloc(ptr);
    if (size_received)
        *size_received = orig_size;
    return orig_size == size_requested;
}

It is encouraged that more useful implementations will be developed which will actually attempt to expand or shrink the current block of memory, and provide useful hints on failure.


negotiate_alloc

int negotiate_alloc(void* ptr, size_t min_size,
                               size_t preferred_size, size_t* size_received);

The intention of this function is to expand a block of memory in place to preferred_size, but failing to do that, expand to min_size. Failure is only indicated (with a zero return) if the memory block cannot be expanded to min_size. If the size of the memory block prior to calling this function is already greater than min_size, a successful return is assured. If the size of the memory block prior to calling this function is greater than preferred_size, an attempt will be made to shrink the block down to preferred_size, but success will be returned whether or not the shrink was successful. If non-null, *size_received will either contain the size of the memory block after a successful return, or on failure a suggested preferred_size that may succeed in a future call.

Precondition:

ptr is non-null and has been returned from request_malloc, malloc, calloc, or realloc, and has not been deallocated (by free or realloc).

min_size <= preferred_size.

Postcondition:

The memory block pointed to by ptr is still valid and has not been altered, aside from being expanded or shrunk. If non-zero has been returned, then the current size of the memory block (as reflected by sizeof_alloc(ptr)) will be greater than or equal to min_size. Additionally if size_received is non-null, then the result of sizeof_alloc(ptr) will be written to *size_received.

If zero is returned and if size_received is non-null then a suggested preferred_size is written to *size_received which may succeed in a future call to negotiate_alloc.

Notes:

It is intended that negotiate_alloc be a primitive function, and not implemented in terms of the other allocate functions. However, the implementation below is conforming:

int
negotiate_alloc(void* ptr, size_t min_size,
                           size_t preferred_size, size_t* size_received)
{
    int result = 0;
    size_t sr_backup;
    size_t* srp = &sr_backup;
    if (size_received)
        srp = size_received;
    while (min_size <= preferred_size)
    {
        result = resize_alloc(ptr, preferred_size, srp);
        if (result)
            break;
        preferred_size = *srp;
    };
    return result;
}

This implementation of negotiate_alloc is not necessarily suboptimal, especially for a single threaded environment. The first call to resize_alloc will either succeed or fail. If it succeeds, then negotiate_alloc suceeds. If it fails, either preferred_size will be set smaller than min_size, and thus negotiate_alloc will fail, or the subsequent call to resize_alloc will succeed (assuming a quality hint on failure) and thus negotiate_alloc will succeed. In all cases, resize_alloc will have been called a maximum of two times. This is essentially the computer code for the previously mentioned negotiation.

The negotiate_alloc function could be used to implement a "push_back" function for the example char_array like this:

void
char_array_push_back(char_array* array, char x)
{
    size_t old_cap = array->capacity;
    size_t new_cap = old_cap ? 2*old_cap : 1;
    if (array->size == array->capacity &&
       (old_cap == 0 ||
        !negotiate_alloc(array->data, old_cap + 1, new_cap, &array->capacity)))
    {
        char* new_data;
        while (1)
        {
            new_data = request_malloc(new_cap, &new_cap);
            if (new_data)
                break;
            if (new_cap <= old_cap)
                on_error();
        }
        if (old_cap)
        {
            memcpy(new_data, array->data, array->size);
            free(array->data);
        }
        array->data = new_data;
        array->capacity = new_cap;
    }
    (array->data)[array->size++] = x;
}

The job of char_array_push_back is to append one char to the array. It first checks to see if there is sufficient capacity to do the job. If there isn't, then it tries to expand the memory in place to anywhere between 1 byte bigger and twice as big as the current capacity. If that doesn't succeed, then it tries to allocate a new block that is twice as big. If that doesn't succeed, it reduces its new allocation request down by the suggested size until the system says that there is not enough room for the single additional char. Only at that point does char_array_push_back give up and call on_error.

This is a prime example of a robust dynamic array client using the expanded interface to exploit the heap to its fullest, even under tight (possibly embedded) conditions.

Appendix: Implementation of realloc

It is not required to implement the standard C heap functions in terms of the allocate primitives. Vendors are free to implement these "primitives" in terms of malloc (et al.) instead, or to have them always fail (as shown in the main body of the proposal). However, this section gives example code for realloc.

void*
realloc(void* ptr, size_t size)
{
    void* result;
    size_t orig_size;
    if (ptr == 0)
        return malloc(size);
    if (size == 0)
    {
        free(ptr);
        return 0;
    }
    orig_size = sizeof_alloc(ptr);
    if (resize_alloc(ptr, size, 0) || size <= orig_size)
        return ptr;
    result = malloc(size);
    if (result)
    {
        memcpy(result, ptr, orig_size);
        free(ptr);
    }
    return result;
}

In the degenerate cases, realloc simply forwards to malloc and free. In the non-degenerate cases, an attempt is made to expand or shrink the block in place with resize_alloc. If realloc is attempting to shrink the block, this implementation tries to shrink the block, but returns success whether or not the block was actually shrunk. If realloc is trying to expand the block, an attempt is made with resize_alloc. If that call was successful, then the original pointer is returned. Else a new block of memory is allocated and the contents of the old memory block are copied to the new memory block. If a larger memory allocation was unsuccessful, null is returned and the original memory block is left intact.