Memory management in C, C++ and Fortran
This document is intended to provide a high-level overview of the many different ways to manage memory in the C, C++ and Fortran programming languages. These languages were chosen as they are the most commonly used languages in scientific computing, but also because they were foundational to the approaches taken by later languages.
Incorrect memory management is the source of many hard to triage bugs and can severely degrade program performance, so it’s worth taking some time to familiarise yourself with how computer programs allocate memory. The specifics of this guide focus on Linux, since that’s the most commonly used OS for computing clusters.
Table of Contents
- Overview — processes and memory
- How do computers store memory? The stack and the heap
- C
- C++
- Fortran
- Detecting memory bugs
Overview — processes and memory
A process is an abstraction provided by the operating system (or OS e.g. Windows, MacOS, Linux) which contains some instructions to execute (usually obtained from an executable file stored somewhere on the machine), plus its own memory space to hold variables (and a few other things that aren’t important for the purposes of this guide). The key point is that a process maps nicely to our intuitive notion of a running computer program: the OS spins up a process when you ask it to run a program, the instructions (compiled from code) of the application runs and then the operating system cleans it up when its done.
Throughout the course of its lifetime, a program needs to ask the OS to provide it with memory whenever it needs to store variables like parameters, arrays or structures. The OS will always try to fulfill requests for memory (provided there’s enough memory available on the computer), and a process “holds on” to its memory until it either releases it back to the OS or finishes its execution. Many programming languages such as Python and Matlab manage this process automatically, but C, C++ and Fortran all require some degree of manual memory management on the part of the programmer.
How do computers store memory? The stack and the heap
A process’s memory may be stored anywhere in the computer’s physical RAM, but it is logically represented to the programmer as being arranged into a contiguous range of virtual addresses: the mapping between physical and virtual addresses is automatically handled by the OS and is completely transparent to the program. A process’s memory addresses are further divided into blocks or regions that share certain properties. The two which are relevant for our purposes are:
1) The stack contains local variables and execution control data (e.g. the current function being executed and where in the program to return to after the current function finishes). The stack automatically grows and shrinks throughout the process execution without any input from the programmer. It is fast to access, but is limited in size. Variables allocated on the stack during a function are no longer accessible once the function returns/exits.
2) The data segment or heap contains dynamically allocated variables. The heap is only limited in size by the amount of memory available on the computer. Variables allocated on the heap stay “alive” until they are explicitly de-allocated, so can be accessed even after the function which allocated them has finished.
All three languages in this guide make the distinction between statically and dynamically allocated variables. Statically allocated variables have sizes which are known (or can be calculated) at compile-time — they do not depend on any value which is set when running the program, such as command-line parameters or input files. Statically allocated variables are always placed on the stack. Dynamically allocated variables (for which the size is only known at run-time) can be stored on either the stack or the heap, depending on the type of variable and allocation.
There are important syntactical and semantic differences in how variables are declared and allocated between languages, so let’s go through the languages one by one.
C
C is the most involved and “low-level” language commonly used for scientific programming, in that it requires programmers to do manual memory management, and has no tools to make that easier. The tradeoff is that C provides much greater control over a program’s execution and a closer coupling to the operating system and hardware. Generally, I recommend against starting new projects in C because it’s a lot more error-prone than C++: C has a lot of so-called “footguns” (programming devices which seemingly exist only to let you shoot yourself in the foot) and the kinds of memory bugs you get in C tend to be difficult to debug. But, there’s a lot of pre-existing code written in C, so if you need to use or maintain a C program, read on.
Preliminary knowledge
Before starting, we need to cover one very important concept in programming: scope. Every identifier (i.e. names of variables, classes, etc) are only able to be accessed within certain blocks of code; outside these regions the identifier is said to be out of scope. Scope is usually defined with respect to regions of source code, and in C this usually refers to regions of code encased in curly braces (“{“ and “}”). As a program executes and enters new functions, loops or blocks, different variables will become valid, and once it leaves a particular block those variables may fall out of scope and be replaced by new ones.
The rules determining scoping in C are quite complicated, but there are some general rules which should help make sense of the following sections:
1) Variable scope is usually limited to the block (region within curly braces) in which it is defined. 2) Variables from block of code are not visible to functions called within that region. 3) Variables defined in a block of code are visible to loops within that same block. 4) Variables defined within a loop are not visible to the enclosing block. 5) Variables in different scope may have the same name, but refer to different underlying bits of memory. The variable in the innermost scope is referred to as shadowing the variable in the outer scope.
Scoping rules are intimately tied to the concept of an execution stack, and thus to stack-based memory.
Stack memory
In C, most “singleton” variable are stored on the stack. If you declare a variable like:
` int x = 5`
Then it goes on the stack. Arrays are also allocated on the stack and can be either static or variable-length. Fixed-length/static arrays have a length which is known at compile time, so have to be declared with either an integer expression or a constant variable, like so:
int arr1[5]; // Ok, makes an array of length 5
int arr2[5*10]; // Also ok, 5*10 is an integer constant expression
const int n = 10+5;
int arr3[n]; // Also ok, n is integer constant (note the "const" type)
Fixed-length arrays have almost no performance overhead, since their storage requirements are determined at compile time. If you know exactly (and I do mean exactly) how much space you need for some array, use a fixed-length array. If the number of non-zero elements in an array changes at run-time, it’s best to use one of the other array types (some old code uses fixed-length arrays to store variable-length data by allocating space that’s “always big enough”. This is extremely bad design and should be avoided.).
The C99 standard introduced variable-length arrays (VLAs), which are allocated on the stack, but whose size is dynamically determined at run-time. As an example:
int n = some_function();
int vlarr[n]; // Length cannot be determined at compile time.
Variable-length arrays only persist while they are in-scope: VLAs are automatically de-allocated once the code block which declared them exits. If a VLA is declared inside a loop, for example, it will be de-allocated after every iteration of the loop:
for(int n = 1; n < 10; n++)
{
int arr_loop[n];
// Do some stuff with the array
}
// arr_loop is no longer accessible and has been de-allocated
Variable-length arrays are very useful for small, temporary arrays, since they have less performance overhead than using manual memory allocation. They should not be used for large amounts of data, however, since the stack has a fixed size which is small compared to the heap. If a VLA requires more memory than the available stack space, a stack overflow will occur, usually crashing the program.
As a rule of thumb, if you’re likely to need more than 100 elements, it’s better to manually allocate the memory. Many modern codebases forbid them entirely because of these potential issues, so check before you use them.
Heap memory
The main way to manually allocate memory in the heap is via the malloc(size_t
size)
function. Do not attempt to access memory before you allocate it, as
this will cause segmentation faults (crashes) and other undefined behaviour.
malloc
takes one argument of type size_t
(on many systems this is
equivalent to a long unsigned integer), which is the size of the allocation in
bytes. Rather than writing the size of the allocation directly, the convention
is to use C’s sizeof
function to calculate the required size. For example, if
you wanted to allocate enough space for 32 integers, the malloc
call might
look like:
int *p; // This will hold a pointer to the start of the allocation
p = malloc(32 * sizeof(int));
A successful malloc
call returns a pointer to the start of the new
allocation (the contents of which are undefined, not necessarily zero); on
failure, it returns a NULL
pointer. Dereferencing (i.e. attempting to use) a
null pointer is undefined behaviour: if you’re extremely lucky the program
will simply crash, if you’re unlucky it will lead to subtle memory corruption
bugs which are extremely hard to track down. It is therefore to check that each
malloc
call was successful before using it, such as in the following snippet:
int *p;
p = malloc(10 * sizeof(int));
if(!p)
{
// Print and handle the error, e.g. by exiting the program
}
For convenience, it can be useful to define your own wrapper function for
malloc
which automatically handles errors. By convention, this is usually
called xmalloc
. An example implementation (taken from Robert Love’s book
Linux System Programming) is below:
void * xmalloc(size_t size)
{
void *p;
p = malloc(size);
if(!p)
{
perror("xmalloc"); // Calls the default Linux error alerting function
exit(EXIT_FAILURE); // EXIT_FAILURE is defined in stdlib.h
}
return(p);
}
Memory allocated via malloc
persists until it is manually deallocated or
until the program exits. If a program repeatedly allocates memory but does not
free it, then its memory usage will continue to grow until it either exits or
exhausts the available memory (usually resulting in a crash) — this is
referred to as a memory leak.
As its name suggests, the function free(void * ptr)
frees the heap
allocation at ptr
and returns the memory to the operating system. Free
only
de-allocates whole blocks of memory, and the input pointer must refer to the
result of a previous call to malloc
. You cannot use free
to de-allocate
partial chunks of memory; attempting to pass a pointer to the middle of an
allocation will result in undefined behaviour (usually a crash). Free
does
not return anything, so it is not necessary to check for errors after calling
it.
In order to avoid memory leaks, you must make sure every call to malloc
is accompanied by a corresponding call to free
once you are done with the
memory. The best way to achieve this is to figure out the intended lifetime of
an allocation and write the call to free
as soon as you write a call to
malloc
, otherwise you risk forgetting to free the allocation if you leave it
until later.
Once a block of memory is freed, the program must not attempt to access it again. Attempting to dereference a pointer after it has been freed is called a use-after-free bug, and is undefined behaviour: again, if you’re lucky the program will crash immediately. Otherwise, use-after-free bugs can cause all sorts of nasty memory bugs. Similarly, trying to free a memory region twice (referred to as a double-free) is also undefined behaviour.
Unfortunately, C places all of the responsibility for managing memory and avoiding memory bugs on the programmer, and has almost no inbuilt safety features or guardrails. It is therefore very easy to write programs which are subtly incorrect, and can be very hard to triage when they break. There are some useful tools for catching these kinds of bugs, including:
- Valgrind,
- AddressSanitizer in the GCC and Clang compilers,
- Electric fence in GCC
But as the saying goes, an ounce of prevention is worth a pound of cure. Avoid using manual memory management as much as possible, and, if you can, avoid using C unless you absolutely need to.
C++
Many sources claim that C++ is just C with classes (which was in fact its original name), but this is an outdated way of thinking about the language. The C++11 standard introduced sweeping changes to the structure of the language, to the point that it’s almost a different language to earlier C++ standards. C++11 (and later standards) have excellent tools and constructs for dynamic memory management, which can all but eliminate the pain-points in C’s memory model.
Not all scientific codebases have migrated to modern C++ practices, so I have included discussion of the old ways of managing memory in case you run into them. If you’re writing your own code, then I strongly recommend using the modern approach as it significantly reduces the chance of introducing memory bugs with almost no performance cost.
Stack-allocated memory
C++ arrays are stored on the stack and are almost identical to arrays in C. The big difference is that variable-length arrays are not permitted in C++, so arrays must be declared with a length that is known at compile time. For example:
int arr[10];
int arr2[5*10];
are both legal array declarations, but
int n;
int arr3[n];
is not (unless n
is an integer constant expression). GCC allows C-style VLA
declarations as an extension, but this is not standard behaviour and you
shouldn’t expect it to work on all compilers (or even future versions of GCC).
For C++11 and later, the standard library defines a std::array
container,
which behaves like a C-style fixed-size array, but includes nice
helper-functions, similar to std::vector
. The bounds of a std::array
must
be known at compile time, and are specified after with the type of data it
contains, using angle-brackets. For example:
std::array<int, 3> arr1; // Array of 3 integers
std::array<some_complex_type, 20> arr2; // Array of 20 elements of a custom type
std::array
is generally easier to use than raw C-style arrays, but otherwise
has the same semantics and memory characteristics.
Manual heap-allocated memory
While C++ includes C-style malloc
and free
, their use is generally
discouraged in favour of the C++ specific operators new
and delete
.
new
functions similarly to malloc
in that it allocates heap memory and returns a
pointer to the beginning of the allocated block, but has syntax which
differentiates between allocating for a single value vs allocating for an
array. For example, we can allocate space on the heap for a single integer by:
int *p;
p = new int;
Similarly, we can allocate space for an array of n
integers (where n
can be
determined at either compile time or run-time) by:
int *p;
int N = ...;
p = new int[N];
Unlike malloc
, new
generally does not return a null pointer if the
allocation fails. Instead, it throws the std::bad_alloc
exception, which
causes the program to exit (possibly with a stack-trace) unless it is
explicitly handled. Consequently, you don’t need to write any error-handling
codes unless you need to do something unusual. It is possible to disable
exceptions, in which case new
can return NULL on failure, but there are only
a few, narrow domains in which this is common practice (scientific computing is
not one of them).
Memory allocated by new
lasts until it is manually de-allocated or until the
program finishes execution. C++ provides the delete
operator to free heap
memory, which has also differentiates between freeing a single element vs
freeing an array of memory. For a single allocation, the syntax is:
delete(ptr);
whereas for an array, the syntax is:
delete[](ptr);
ptr
must point to the start of an allocation: passing a pointer to the middle
of an array results in undefined behaviour, so you can’t only free part of an
allocation.
Once a block of memory is freed by delete
, the program must not attempt to
access it again. Attempting to dereference a pointer after it has been freed is
called a use-after-free bug, and is undefined behaviour: if you’re lucky, the
program will crash immediately. Otherwise, use-after-free bugs can
cause all sorts of nasty memory bugs.
As with C, every new
must have an accompanying delete
later in the code.
The easiest way to achieve this is to write the new
and delete
calls at the
same time to ensure you don’t forget about them.
Modern C++ memory management: RAII
RAII (short for Resource Acquisition is Initialisation) is a powerful idiom which underlies modern C++ memory management. In RAII, allocation and de-allocation are handled by the compiler automatically and are tied to the object’s lifetime. Broadly speaking, the C++ compiler automatically inserts code for memory allocation when an RAII managed object is initialised (i.e. given its initial value after being declared), and inserts de-allocation code when the object is no longer in scope. The programmer therefore does not need to manually acquire and free resources, but, unlike languages such as Python which use garbage collection, this allocation and de-allocation happens deterministically and predictably so its impact on performance is negligible.
RAII significantly reduces the chances of memory bugs, as it completely
eliminates the need to keep track of malloc()
s and free()
s for RAII managed
objects. It is also much more robust to error conditions than manual memory
management — RAII was originally conceived of to ensure that resources would
be cleanly and automatically de-allocated when a program encounters and
exception. In contrast, it is necessary to write exception-handling code from
scratch when using manual memory allocations, which is much more error prone.
In modern C++, only certain objects are managed via RAII. The categories which are important for scientific programming are:
- Standard library containers, such as
std::vector
orstd::map
, - Most large third-party containers, such as those in Boost,
- Smart pointers, such as
std::shared_ptr
andstd::unique_ptr
.
Fortunately, these categories are sufficient for almost all use-cases in scientific computing. There is a stubbornly persistent misconception that RAII and standard containers are slower than manual memory allocation, but this is not true — it is possible (and desirable) to build a large, high-performance scientific code base with almost all memory management handled by smart pointers and RAII containers (see, for example GROMACS2020 or AMBiT). The efficiency of RAII is dependent on the compiler making sensible decisions about when to allocate/de-allocate memory, but major C++ compilers are very smart these days so it almost always Just Works without needing manual intervention.
Standard (and not-so-standard) containers
C++ comes with a rich standard library, including implementations of several commonly-used data structures. These implementations are referred to as containers, and can be used to store any data type, including custom classes (with one or two exceptions). As the name suggests, standard containers have a consistent interface and syntax, and provide similar utility functions for manipulating the underlying data. This makes changing between different data structures relatively painless, even in large, complex codebases.
Standard containers all use RAII for memory management, so there’s
no need to use new
and delete
. A full run-down of the available containers
can be found at this link, but
there’s a few which are especially useful in scientific programs:
std::array
— already covered in the section on stack-allocated memory.std::vector
— an array-like structure which automatically and dynamically grows (or shrinks) in size as new elements are added (or removed), meaning that it’s not necessary to know how many elements thevector
will contain before declaring it. Elements in avector
are stored contiguously on the heap, so are very fast to iterate (loop) over.std::map
— an associative array which allows for elements to be accessed by a key of (almost) arbitrary type (as opposed to anarray
orvector
which are accessed by and integer index based on its position in the array). Elements in astd::map
are stored in order, and are fast to access but slow to iterate over.std::set
— a collection of unique elements (i.e. a piece of data is either in aset
or it is not, but cannot occur more than once), stored in sorted order.
Proper choice of data structure is extremely important for ensuring program performance, and strongly depends on the characteristics of the program. If you’re unsure, feel free to ask me (Emily) and I’ll be happy to help.
Smart pointers
Smart pointers (properly introduced in C++11) allow for any type of data or
class to be managed using RAII. They act like regular pointers, but keep track
of the object’s lifetime and automatically de-allocate it once it falls out of
scope. The two most commonly used types of smart pointers are std::shared_ptr
and std::unique_ptr
, which have similar syntax but different use-cases.
A shared_ptr
behaves just like a regular pointer, except it includes an extra
bit of data called a reference counter which keeps track of how many
shared_ptr
instances point to a particular underlying object at a given time
during program execution. Every time you create a new shared_ptr
, the
reference counter for the underlying object is incremented by 1; when the
shared_ptr
falls out of scope it is decremented by 1, to indicate that this
particular pointer is no longer around. When the reference counter decreases
to zero, there is nothing in the code which requires the underlying object
anymore, so its associated memory (and other resources) is de-allocated. In
this way, it is possible to pass as many pointers to the same object as
required, without having to worry about when to free the underlying memory —
the compiler will keep track of it and do it for you.
A unique_ptr
keeps the automatic memory management of the shared_ptr
, but
adds the constraint that only a single instance of the underlying object may
exist at one time. It is not possible to make a copy of a unique_ptr
, you can
only move it around. This is less flexible than the shared_ptr
but provides
extra safety guarantees when dealing with objects for which only a single copy
should exist. An example of this would be a pointer to the underlying grid of a
simulation: there is logically only one lattice, so if a program makes multiple
copies it risks them becoming out of sync with each other.
Smart pointers are a really useful addition to the language and significantly reduce the chance of memory allocation bugs. They should be used whenever possible, except in instances where strict compatibility with old C++ standards is needed.
Fortran
Fortran is a lot older than C, and so has its origins in a time before computer engineers settled on a common “standard” for memory layout. As such, the exact details of a Fortran program’s memory model depend on both the choice of Fortran standard, as well as which compiler is used to build it. There are a few common features and guidelines which are worth knowing about regardless of compiler, but it’s worth clearing up some terminology relating to Fortran standards first.
Fortran standards - fixed-form vs free-form
Fortran source code was originally designed to fed to a computer via punched cards, which had a fixed number of columns per line (80 column limitations were very common). Space on the punched-cards was tight, so it made sense to try to encode as much information about a given line into the card as possible. This led to the convention of fixed-form input: the layout of characters in a line is significant and the presence or absence of a character in a specific column could change the meaning of a line. A full specification of common fixed-form formats can be found in the Oracle Fortran compiler documentation.
Fixed-form was the only supported syntax in Fortran until the Fortran 90
standard, which introduced so-called free-form input, where the format and
spacing of input lines was no longer significant (almost all modern programming
languages use free-form input, so this form is probably the most familiar to
new programmers). Since Fortran 77 was the last standard in which fixed-form
input was mandatory, the term “Fortran 77” is often used synonymously with
“fixed-form Fortran”, but this is not strictly correct. Modern Fortran
standards still support fixed-form input, and all major Fortran compilers
include some flag to signify that a source-code file is in fixed-format (e.g.
gfortran has the -ffixed-form
compiler flag).
The upshot of this is that fixed-form codebases are not restricted to Fortran 77 features — it is possible to use modern Fortran features, including memory management, without needing to convert fixed-form files to free form.
Static arrays
Fortran makes a distinction between static and dynamic arrays, with static arrays being any type of array whose size is known at compile time. This was the only type of array provided by Fortran 77 and earlier (although many implementations defined non-standard extensions allowing for other kinds of memory allocation). Static arrays can be defined using either of two possible forms:
integer arr(10) ! Array of 10 integers
or
integer, dimension(10,10) :: arr2D ! 2D array of 10x10 integers
The first syntax is the standard in Fortran 77 and earlier dialects, but is still supported by newer standards, while the second form is used in newer standards (either fixed- or free-form).
The storage location of static arrays, whether they are stored on the stack of
the heap, is an implementation detail which varies between compilers:
gfortran
allocates static arrays on the heap, whereas the Intel Fortran
compiler allocates them on the stack (although this can be changed using
compiler flags). Depending on the compiler, all static arrays may be allocated
and initialised at the start of a program’s execution, even if they are never
used.
Generally, it is best to avoid using large static arrays — there are many
compiler-dependent problems which occur when the total amount of
statically- allocated memory (from all static arrays and COMMON
blocks in the
program) becomes too large. By default, both gfortran
and the Intel compiler
use low-level optimisations which assume that the total amount of
statically-allocated memory is less than 2GB, so they will not compile programs
with more memory than this threshold without special compilation flags (e.g.
-mcmodel=medium
in ifort
). These compilation flags result in reduced
program performance, so are a stop-gap solution at best. Furthermore, it is
impossible to use OpenMP-driven parallelism with large static arrays, as almost
all OpenMP runtimes allocate a separate copy of every static array for each
thread — if the arrays are too large then this will blow up the memory usage
and cause the program to immediately crash.
Even though statically-allocated arrays are very common in older codebases, they cause enough problems that it’s probably worth the programming effort to change to dynamically-allocated (heap) arrays for storing large amounts of data. As previously mentioned, it is possible to use memory management techniques from newer Fortran standards while still maintaining fixed-form style source code, meaning that the required changes to program structure are relatively small.
Dynamic allocation — the ALLOCATABLE attribute
Fortran 90 introduced the concept of allocatable arrays, which are
dynamically allocated arrays stored on the heap. Allocatable arrays must be
declared with the ALLOCATABLE
attribute, and are declared with special
placeholder dimensions:
integer, allocatable, dimension(:) :: arr1d ! 1D allocatable array of integers
real, allocatable, dimension(:,:) :: arr2d ! 2D allocatable array of reals
The “:” indicates to the compiler that the array will be dynamically allocated
at some point in the future, but is not itself an allocation. Before
allocatable arrays can be used, memory must be manually allocated via the
ALLOCATE
function (this is analogous to malloc
in C and new
in C++).
ALLOCATE
takes as its argument an allocatable array which is not yet
associated with any heap memory, as well as the amount of memory to allocate.
For example, to allocate memory for the arrays arr1d
and arr2d
in the
example above, we would do:
allocate(arr1d(10))
allocate(arr2d(100, 1000))
The dimensions given to ALLOCATE
must match the dimensions the target array
was declared with, but are only limited in size by the amount of available
system memory.
Fortran 90 also supplies the DEALLOCATE
command to free the memory associated
with an array and release it back to the operating system. It takes an array
(which must be associated with an memory allocation resulting from a call to
ALLOCATE
), and does not require dimensions to be specified. For example, to
de-allocate arr1d
and arr2d
, we would use:
deallocate(arr1d)
deallocate(arr2d)
It is undefined behaviour to call DEALLOCATE
on an array that has already
been freed, or on an array which has not been allocated.
Modern Fortran has some protection against memory leaks, as the compiler will
automatically insert calls to DEALLOCATE
into the executable code once it
detects an array is no longer in scope. It is still sometimes necessary to
de-allocate memory, but it is not as critical as proper use of free
in C.
Detecting memory bugs
What about memory bugs? What do you do if a program crashes or produces junk output due to memory issues? Debugging memory bugs can be very difficult, but there are some useful Unix tools which can greatly simplify the process. These tools roughly fall into two categories: compiler tools, which require you to recompile the code with particular flags or libraries, and runtime tools which do not require the code to be recompiled.
Compile-time instrumentation tools
All of the tools in this section require the code to be recompiled, which may not always be possible if
using pre-built binaries (particularly for proprietary software). They tend to give better performance
and more specific diagnostics than run-time tools, however, as they can leverage information from the
compiler to mark-up the resulting code (especially if the binary is compiled with the -g
flag to
insert debugging symbols). This section will focus on extensions to gcc and clang called
Sanitizers[sic],
which instrument the compiled code to catch and report various error conditions. The most useful
sanitizers for scientific programming are as follows:
- AddressSanitizer: also known as ASan, modifies memory management functions to print a warning or
crash when encountering an invalid memory request (including operations which would not normally
raise a segmentation fault but are errors nonetheless). ASan has been implemented in clang and gcc
and is enabled through the
-fsanitize=address
compiler flag (although thelibasan
library may not be installed by default on all systems). ASan has some cost in the form of higher memory consumption at run-time, so should only be used when debugging. - UndefinedBehaviorSanitizer: also known as UBSan, instruments the target code to catch undefined
behaviour such as NULL pointer dereferencing, integer overflow and division by zero. Although most
features in UBSan are not strictly related to memory safety, it’s still an extremely useful
debugging tool. UBSan is enabled through the
-fsanitize=undefined
flag, and has very little overhead (so is usually fine for general use). - LeakSanitizer: similar to ASan (technically a part of ASan which can be run as a standalone tool),
LSan detects memory leaks. Can be combined with ASan by compiling with the
fsanitize=address
flag and setting the environment variableASAN_OPTIONS=detect_leaks=1
before running the target executable. LSan can also be used as a standalone tool (with less overhead than the full AddressSanitizer) by compiling with-fsanitize=leak
. -
Static analysis: static analysers are automated tools which analyse the source code of a program without running it to find programming bugs (including some memory bugs like double-free errors). Static analysers use the same principles as regular compiler errors, but are much more thorough as they are “allowed” to run for a much longer time than a compiler (where designers typically want to limit compilation times to improve usability).
One important tool for static analysis is the Clang Static Analyzer, which is developed as part of the clang compiler project and leverages clang’s architecture to search for programming bugs. It’s usage is somewhat intricate, so it’s a good idea to read the documentation before trying it out.
Run-time tools
Dynamic instrumentation has the advantage of not requiring special compilation steps, but often comes with a much larger performance and memory usage overhead than comparable static tools. The two most important run-time tools are debuggers and Valgrind.
First, debuggers. Running code under a debugger like GDB is a good way to catch and inspect the state of the program in the lead up to bugs, especially for obvious bugs like segmentation faults which halt the program execution. Subtler bugs like silent memory corruption are more tricky to pin down, as it’s often not obvious where the problem is located in the code - without an obvious starting point, it can take a long time to move through the code execution. This makes it all the more important to brush up on basic and advanced GDB usage so you’ll be prepared for the kinds of gnarly bugs that come with manual memory management.
The other, major run-time memory analysis program it’s important to know about is Valgrind. Valgrind is
a framework for dynamic instrumentation and analysis of code (running on Linux systems), which
virtualises the instructions making up the original program and runs them on a “synthetic CPU”, where
the instructions can be instrumented by specialised tools before they are executed. The upshot of this
(somewhat technical) description is that Valgrind lets you observe and profile 100% of your code’s
execution path (including any libraries you may have linked to) without needing to recompile the
executable (although if you have access to the source code you can get more detailed statistics by
compiling with the -g
flag). Valgrind imposes a very large performance penalty, but will automatically
catch whole classes of memory bugs, so the tradeoff is well worth it when debugging code.
Valgrind includes many tools for memory and performance analysis, but the most commonly-useful ones are 1:
constructs, so is not terribly useful for much scientific software in common use.
- Memcheck: checks for invalid memory accesses (e.g. out-of-bounds or use-after-free errors) and memory leaks.
- Callgrind: generates call-graphs (tree-like graphical representation of a program’s execution flow) and profiling information. Can be combined with the kcachegrind GUI tool for an easier-to-interpret overview of call-graph data.
- Massif: profiles heap memory usage over a program’s lifetime. Can be combined with the GUI frontend massif-visualizer for a more easy-to-read profile.
- DHAT (Dynamic Heap Analysis Tool): analyses how a program uses its heap memory, including frequency of allocations, under-utilised allocations and inefficient access patterns. More of a niche tool than the other three in this list, but these kinds of bugs are extremely difficult to track down without DHAT.
Basic usage of Valgrind takes the following form:
valgrind --tool=<tool> ./your_program
where <tool>
is the name of the tool you want to run (which must be all lower case). So to run a
program with the DHAT tool, you’d do something like:
valgrind --tool=dhat ./your_program
Valgrind produces a lot of output, so it’s useful to either redirect the output to a file, or to use a
dedicated logging file through the command-line argument --log-file=<filename>
.
As with all types of debugging, there is no “silver bullet” for catching memory bugs. However, judicious use of these tools will help speed up the process of debugging and harden your code against memory corruption errors.
-
Valgrind also has a thread-safety analyser called
Hellgrind
, but it does not understand OpenMP ↩