How Heaps Work, Part 1 - Why Heaps?

Ever wondered how malloc/new in your favorite programming languages work? Well, thanks to my current job, I've learned a whole lot more than I've ever thought possible, and now you can join me! This article is Part 1, which discusses why we wrote those memory management functions at all.

By: TheHans255

2/15/2025

At my job at FUTO, I've been doing a lot of work with bare metal software - that is, software that doesn't depend on an OS and works far closer to the underlying hardware. It turns out that when you do that, a lot of supporting code that you could previously take for granted is now something you have to worry about intimately. One of those pieces of supporting code is the implementation of your heap - e.g. in C, your malloc(), calloc(), free(), and realloc() functions. I've been spending a lot of time refining the heap implementation in FUBS, and I think it might be fun to chat about why you might want a heap, how heaps work, and how you might go about why you would write your own.

This article will be split into 3 parts. This, the first part talks about why a program might want access to a heap in the first place. The second part walks through some very basic implementations of heap allocators, and the third part talks about various optimizations that you might make when designing a real heap allocator.

To start with, let's consider a simple function:

uint64_t fnv_hash(int c, char *v) {
    uint64_t hash = 0xcbf29ce484222325;
    for (int i = 0; i < c; i++) {
        hash *= 0x100000001b3;
        hash ^= c;
    }
    return hash;
}

This function computes the Fowler-Noll-Vo hash for a given range of data. This is a rather simple function that's used to help implement hashmaps, but more important than that is how easy it is to manage the memory this function needs. Specifically:

All of the variables that it creates or deals with only exist for the duration of the function - this includes the arguments c and v (which are copied from the caller) and the inner variables hash and i. The hash variable is copied into the return value (which is a register where the caller expects to find it) and the other variables are destroyed.
All of the variables are also a known size, all of which are less than or equal to the size of the architecture's general purpose registers. (Note specifically that v is a pointer - the data has a size known at runtime, but v itself is constant-sized).

As such, all variables created by this function can go into automatic storage - that is, they can either be stored in registers, or they can be placed on the call stack. We can look at the function in the Godbolt Compiler Explorer to see that this is exactly what happens (note that this is compiled with -Os for clarity):

fnv_hash:
        movabs  rax, -3750763034362895579
        xor     edx, edx
        movsx   rsi, edi
        movabs  rcx, 1099511628211
.L2:
        cmp     edx, edi
        jge     .L5
        imul    rax, rcx
        inc     edx
        xor     rax, rsi
        jmp     .L2
.L5:
        ret

Let's explore another, slightly more complex function, which can't return data in a register that the function can expect.

void sha256_hash(int c, char *v) {
    int h0 = 0x6a09e667;
    int h1 = 0xbb67ae85;
    int h2 = 0x3c6ef372;
    int h3 = 0xa54ff53a;
    int h4 = 0x510e527f;
    int h5 = 0x9b05688c;
    int h6 = 0x1f83d9ab;
    int h7 = 0x5be0cd19;

    // ... do a whole bunch of timey-wimey, hashy-washy stuff
    //     that would be too complex to include here, putting
    //     the final hash in h0-h7

    // ... and now we have a problem - how do we get h0-h7 out of here
    //     if we can't return that much data at once?
}

An SHA-256 hash is, well, 256 bits. This is eight times larger than the size of a C int, so we can't just return an int or even a long int. What we need to somehow return is an int[8], or an array of 8 ints.

Naively, we might choose to create that array inside the function:

int *sha256_hash(int c, char *v) {
    int h0 = 0x6a09e667;
    int h1 = 0xbb67ae85;
    int h2 = 0x3c6ef372;
    int h3 = 0xa54ff53a;
    int h4 = 0x510e527f;
    int h5 = 0x9b05688c;
    int h6 = 0x1f83d9ab;
    int h7 = 0x5be0cd19;

    // ... do a whole bunch of timey-wimey, hashy-washy stuff
    //     that would be too complex to include here, putting
    //     the final hash in h0-h7

    int hash_out[8];
    hash_out[0] = h0;
    hash_out[1] = h1;
    hash_out[2] = h2;
    hash_out[3] = h3;
    hash_out[4] = h4;
    hash_out[5] = h5;
    hash_out[6] = h6;
    hash_out[7] = h7;
    return hash_out;
}

A problem immediately appears when you do this, though: when you create an array in a function's automatic storage like this, the array is destroyed when the function exits. This is fine if, say, you only need the array to help you do bookkeeping, but since we're using the array to output data, the array's memory will no longer be reserved. An extremely anal-retentive runtime might cause your program to crash when it accesses the memory again, and a more typical runtime will end up with unpredictable results, since another function call will cause the contents of hash_out to be overwritten.

However, in this case, we can simply move the creation of the array out to the function's caller:

int *sha256_hash(int c, char *v, int *hash_out) {
    int h0 = 0x6a09e667;
    int h1 = 0xbb67ae85;
    int h2 = 0x3c6ef372;
    int h3 = 0xa54ff53a;
    int h4 = 0x510e527f;
    int h5 = 0x9b05688c;
    int h6 = 0x1f83d9ab;
    int h7 = 0x5be0cd19;

    // ... do a whole bunch of timey-wimey, hashy-washy stuff
    //     that would be too complex to include here, putting
    //     the final hash in h0-h7

    hash_out[0] = h0;
    hash_out[1] = h1;
    hash_out[2] = h2;
    hash_out[3] = h3;
    hash_out[4] = h4;
    hash_out[5] = h5;
    hash_out[6] = h6;
    hash_out[7] = h7;
}

void usage_example() {
    char *data = "This is a string that we need to hash."
    int data_len = strlen(data);
    int hash_out[8]; // create an array on the stack
    sha256_hash (data_len, data, hash_out);
}

And with that, we're back to the first example - all of the variables that this function deal with only exist within the lifetime of the function, and also have a known size. We have a bunch of ints, and copies of c, v, and hash_out, the last two of which are pointers to external data which can be freely copied and stored in registers. And since no variables are actually being returned from the function, all of the variables can simply be destroyed.

Now, let's consider a function that's a lot more, well, "hard mode" than that:

long unsigned int strlen (const char *s);

void find_replace(char *s, char *target, char *replace_with, char *out) {
    int target_len = strlen(target);
    int match_len = 0;
    char c;
    while ((c = *s) != 0) {
        if (c == target[match_len]) {
            match_len++;
            if (match_len == target_len) {
                char o;
                char *r = replace_with
                while ((o = *r) != 0) {
                    *out = o;
                    out++;
                    r++;
                }
                match_len = 0;
            }
        } else {
            match_len = 0;
            *out = c;
            out++;
        }
        s++;
    }
}

This function performs a non-overwriting version of the Find/Replace operation, which replaces all instances of a string fragment in an input with a different fragment in the output. We do know that out needs to point to some array that can hold the output, but we have a problem: how big is that array supposed to be?

It's certainly not a constant size, because the length of the output depends on how many instances of target there are in the input string. If replace_with is smaller than or the same size as target, of course, we can just create an array of the same size as s. But if replace_with is larger than target, then we need a larger array, and the only way to actually know exactly how big that array should be is by doing the find/replace calculation itself, which arguably defeats the point.

Therefore, we need a way for our find_replace function to create an array that is exactly the size that it needs to be. That array needs to outlive find_replace and needs to be later destroyed by the caller. We also need to be able to resize the array in case our original guess is wrong.

This is where the heap comes in. Or more accurately, this is where manual memory management comes in: we have a set of functions that can allocate a block of memory, separate from our automatic storage or global variables, and keep that block of memory alive and reserved until we explicitly indicate that we no longer need it. In C, these facilities are exposed through the functions malloc, realloc, and free - here is the find_replace function implemented with those functions:

#include <stdio.h>
#include <stdlib.h>
long unsigned int strlen (const char *s);

char *find_replace(char *s, char *target, char *replace_with) {
    int out_len = 0;
    int out_capacity = 16;
    char *out = (char *) malloc(out_capacity);
    int target_len = strlen(target);
    int match_len = 0;
    char c;
    while ((c = *s) != 0) {
        if (c == target[match_len]) {
            match_len++;
            if (match_len == target_len) {
                char o;
                char *r = replace_with;
                while ((o = *r) != 0) {
                    if (out_len == out_capacity) {
                        out_capacity *= 2;
                        out = (char *)realloc(out, out_capacity);
                    }
                    out[out_len] = o;
                    out_len++;
                    r++;
                }
                match_len = 0;
            }
        } else {
            match_len = 0;
            if (out_len == out_capacity) {
                out_capacity *= 2;
                out = (char *)realloc(out, out_capacity);
            }
            out[out_len] = c;
            out_len++;
        }
        s++;
    }
    out = (char *)realloc(out, out_len);
    return out;
}

void usage_example() {
    char *replaced = find_replace("Some text", "ex", "neo");
    printf("%s\n", replaced);
    free(replaced);
}

In a more object oriented language, such as C++ or Java, instead of forcing you to call those manual memory management functions directly, usage of the heap is achieved by instead adding enhancements to automatic storage, namely by calling extra code when variables are destroyed. For instance, for this array, C++ provides the object std::vector, which automatically creates, manages, and resizes a heap, and destroys the array when the object exits automatic storage:

#include <iostream>
#include <vector>
#include <cstring>

std::vector<char> find_replace(const char *s, const char *target, const char *replace_with) {
    std::vector<char> out;
    int target_len = strlen(target);
    int match_len = 0;
    char c;
    while ((c = *s) != 0) {
        if (c == target[match_len]) {
            match_len++;
            if (match_len == target_len) {
                char o;
                char *r = (char *) replace_with;
                while ((o = *r) != 0) {
                    out.push_back(o);
                    r++;
                }
                match_len = 0;
            }
        } else {
            match_len = 0;
            out.push_back(c);
        }
        s++;
    }
    out.shrink_to_fit();
    return out;
}

void usage_example() {
    std::vector<char> replaced = find_replace("Some text", "ex", "neo");
    std::cout << replaced.data() << '\n';
}

(Of course, it would be better to use std::string here, but using vector more directly illustrates the fact that an array is being dynamically managed).

Hence, we have established why we would want a heap. We haven't even gone over more advanced needs, such as automatic garbage collection used in languages like Java and JavaScript, but we have already established why we might want the ability to arbitrarily create arbitrary-lifetime regions of memory. In Part 2, we will discuss what functions our OS might provide for doing this and how we might write a simple heap to practically use them. In Part 3, we will talk about some more advanced optimizations we might make in order to write a heap that practically works for your use case.