
How Heaps Work, Part 1 - Why Heaps?
Ever wondered how malloc/new in your favorite programming languages work? Well, thanks to my current job, I've learned a whole lot more than I've ever thought possible, and now you can join me! This article is Part 1, which discusses why we wrote those memory management functions at all.
By: TheHans255
2/15/2025
At my job at FUTO, I've been doing a lot of work with bare
metal software - that is, software that doesn't depend on an OS and works far
closer to the underlying hardware. It turns out that when you do that, a lot of
supporting code that you could previously take for granted is now something you
have to worry about intimately. One of those pieces of supporting code is the
implementation of your heap - e.g. in C, your malloc()
, calloc()
, free()
,
and realloc()
functions. I've been spending a lot of time refining the heap
implementation in FUBS, and I think it might be fun to chat about why you might
want a heap, how heaps work, and how you might go about why you would write your
own.
This article will be split into 3 parts. This, the first part talks about why a program might want access to a heap in the first place. The second part walks through some very basic implementations of heap allocators, and the third part talks about various optimizations that you might make when designing a real heap allocator.
To start with, let's consider a simple function:
uint64_t fnv_hash(int c, char *v) {
uint64_t hash = 0xcbf29ce484222325;
for (int i = 0; i < c; i++) {
hash *= 0x100000001b3;
hash ^= c;
}
return hash;
}
This function computes the Fowler-Noll-Vo hash for a given range of data. This is a rather simple function that's used to help implement hashmaps, but more important than that is how easy it is to manage the memory this function needs. Specifically:
- All of the variables that it creates or deals with only exist for the duration
of the function - this includes the arguments
c
andv
(which are copied from the caller) and the inner variableshash
andi
. Thehash
variable is copied into the return value (which is a register where the caller expects to find it) and the other variables are destroyed. - All of the variables are also a known size, all of which are less than or
equal to the size of the architecture's general purpose registers. (Note
specifically that
v
is a pointer - the data has a size known at runtime, butv
itself is constant-sized).
As such, all variables created by this function can go into automatic
storage - that is, they can either be stored in registers, or they can be
placed on the call stack. We can look at the function in the
Godbolt Compiler Explorer to see that this is exactly
what happens (note that this is compiled with -Os
for clarity):
fnv_hash:
movabs rax, -3750763034362895579
xor edx, edx
movsx rsi, edi
movabs rcx, 1099511628211
.L2:
cmp edx, edi
jge .L5
imul rax, rcx
inc edx
xor rax, rsi
jmp .L2
.L5:
ret
Let's explore another, slightly more complex function, which can't return data in a register that the function can expect.
void sha256_hash(int c, char *v) {
int h0 = 0x6a09e667;
int h1 = 0xbb67ae85;
int h2 = 0x3c6ef372;
int h3 = 0xa54ff53a;
int h4 = 0x510e527f;
int h5 = 0x9b05688c;
int h6 = 0x1f83d9ab;
int h7 = 0x5be0cd19;
// ... do a whole bunch of timey-wimey, hashy-washy stuff
// that would be too complex to include here, putting
// the final hash in h0-h7
// ... and now we have a problem - how do we get h0-h7 out of here
// if we can't return that much data at once?
}
An SHA-256 hash is, well, 256 bits. This is eight times larger than the size of
a C int
, so we can't just return an int
or even a long int
. What we need
to somehow return is an int[8]
, or an array of 8 ints.
Naively, we might choose to create that array inside the function:
int *sha256_hash(int c, char *v) {
int h0 = 0x6a09e667;
int h1 = 0xbb67ae85;
int h2 = 0x3c6ef372;
int h3 = 0xa54ff53a;
int h4 = 0x510e527f;
int h5 = 0x9b05688c;
int h6 = 0x1f83d9ab;
int h7 = 0x5be0cd19;
// ... do a whole bunch of timey-wimey, hashy-washy stuff
// that would be too complex to include here, putting
// the final hash in h0-h7
int hash_out[8];
hash_out[0] = h0;
hash_out[1] = h1;
hash_out[2] = h2;
hash_out[3] = h3;
hash_out[4] = h4;
hash_out[5] = h5;
hash_out[6] = h6;
hash_out[7] = h7;
return hash_out;
}
A problem immediately appears when you do this, though: when you create an
array in a function's automatic storage like this, the array is
destroyed when the function exits. This is fine if, say, you only need the
array to help you do bookkeeping, but since we're using the array to output
data, the array's memory will no longer be reserved. An extremely anal-retentive
runtime might cause your program to crash when it accesses the memory again, and
a more typical runtime will end up with unpredictable results, since another
function call will cause the contents of hash_out
to be overwritten.
However, in this case, we can simply move the creation of the array out to the function's caller:
int *sha256_hash(int c, char *v, int *hash_out) {
int h0 = 0x6a09e667;
int h1 = 0xbb67ae85;
int h2 = 0x3c6ef372;
int h3 = 0xa54ff53a;
int h4 = 0x510e527f;
int h5 = 0x9b05688c;
int h6 = 0x1f83d9ab;
int h7 = 0x5be0cd19;
// ... do a whole bunch of timey-wimey, hashy-washy stuff
// that would be too complex to include here, putting
// the final hash in h0-h7
hash_out[0] = h0;
hash_out[1] = h1;
hash_out[2] = h2;
hash_out[3] = h3;
hash_out[4] = h4;
hash_out[5] = h5;
hash_out[6] = h6;
hash_out[7] = h7;
}
void usage_example() {
char *data = "This is a string that we need to hash."
int data_len = strlen(data);
int hash_out[8]; // create an array on the stack
sha256_hash (data_len, data, hash_out);
}
And with that, we're back to the first example - all of the variables that this
function deal with only exist within the lifetime of the function, and also have
a known size. We have a bunch of int
s, and copies of c
, v
, and hash_out
,
the last two of which are pointers to external data which can be freely copied
and stored in registers. And since no variables are actually being return
ed
from the function, all of the variables can simply be destroyed.
Now, let's consider a function that's a lot more, well, "hard mode" than that:
long unsigned int strlen (const char *s);
void find_replace(char *s, char *target, char *replace_with, char *out) {
int target_len = strlen(target);
int match_len = 0;
char c;
while ((c = *s) != 0) {
if (c == target[match_len]) {
match_len++;
if (match_len == target_len) {
char o;
char *r = replace_with
while ((o = *r) != 0) {
*out = o;
out++;
r++;
}
match_len = 0;
}
} else {
match_len = 0;
*out = c;
out++;
}
s++;
}
}
This function performs a non-overwriting version of the Find/Replace operation,
which replaces all instances of a string fragment in an input with a different
fragment in the output. We do know that out
needs to point to some array that
can hold the output, but we have a problem: how big is that array supposed to
be?
It's certainly not a constant size, because the length of the output depends on
how many instances of target
there are in the input string. If replace_with
is smaller than or the same size as target
, of course, we can just create an
array of the same size as s
. But if replace_with
is larger than target
,
then we need a larger array, and the only way to actually know exactly how big
that array should be is by doing the find/replace calculation itself, which
arguably defeats the point.
Therefore, we need a way for our find_replace
function to create an array that
is exactly the size that it needs to be. That array needs to outlive
find_replace
and needs to be later destroyed by the caller. We also need to be
able to resize the array in case our original guess is wrong.
This is where the heap comes in. Or more accurately, this is where manual
memory management comes in: we have a set of functions that can allocate a
block of memory, separate from our automatic storage or global variables, and
keep that block of memory alive and reserved until we explicitly indicate that
we no longer need it. In C, these facilities are exposed through the functions
malloc
, realloc
, and free
- here is the find_replace
function
implemented with those functions:
#include <stdio.h>
#include <stdlib.h>
long unsigned int strlen (const char *s);
char *find_replace(char *s, char *target, char *replace_with) {
int out_len = 0;
int out_capacity = 16;
char *out = (char *) malloc(out_capacity);
int target_len = strlen(target);
int match_len = 0;
char c;
while ((c = *s) != 0) {
if (c == target[match_len]) {
match_len++;
if (match_len == target_len) {
char o;
char *r = replace_with;
while ((o = *r) != 0) {
if (out_len == out_capacity) {
out_capacity *= 2;
out = (char *)realloc(out, out_capacity);
}
out[out_len] = o;
out_len++;
r++;
}
match_len = 0;
}
} else {
match_len = 0;
if (out_len == out_capacity) {
out_capacity *= 2;
out = (char *)realloc(out, out_capacity);
}
out[out_len] = c;
out_len++;
}
s++;
}
out = (char *)realloc(out, out_len);
return out;
}
void usage_example() {
char *replaced = find_replace("Some text", "ex", "neo");
printf("%s\n", replaced);
free(replaced);
}
In a more object oriented language, such as C++ or Java, instead of forcing you
to call those manual memory management functions directly, usage of the heap is
achieved by instead adding enhancements to automatic storage, namely by calling
extra code when variables are destroyed. For instance, for this array, C++
provides the object std::vector
, which automatically creates, manages, and
resizes a heap, and destroys the array when the object exits automatic
storage:
#include <iostream>
#include <vector>
#include <cstring>
std::vector<char> find_replace(const char *s, const char *target, const char *replace_with) {
std::vector<char> out;
int target_len = strlen(target);
int match_len = 0;
char c;
while ((c = *s) != 0) {
if (c == target[match_len]) {
match_len++;
if (match_len == target_len) {
char o;
char *r = (char *) replace_with;
while ((o = *r) != 0) {
out.push_back(o);
r++;
}
match_len = 0;
}
} else {
match_len = 0;
out.push_back(c);
}
s++;
}
out.shrink_to_fit();
return out;
}
void usage_example() {
std::vector<char> replaced = find_replace("Some text", "ex", "neo");
std::cout << replaced.data() << '\n';
}
(Of course, it would be better to use std::string
here, but using vector
more directly illustrates the fact that an array is being dynamically managed).
Hence, we have established why we would want a heap. We haven't even gone over more advanced needs, such as automatic garbage collection used in languages like Java and JavaScript, but we have already established why we might want the ability to arbitrarily create arbitrary-lifetime regions of memory. In Part 2, we will discuss what functions our OS might provide for doing this and how we might write a simple heap to practically use them. In Part 3, we will talk about some more advanced optimizations we might make in order to write a heap that practically works for your use case.