Thread Fuzzing

When recording a multi-threaded program, LiveRecorder allows one thread to run at a time, to protect the integrity of its data structures, and to simplify the capturing of the behaviour of the program. LiveRecorder switches between program threads when one thread makes a blocking system call, or when enough BBs have been recorded.

One consequence of this is that data races and other concurrency bugs occur with much lower frequency when a program is being recorded, and some data races may be impossible to reproduce with default settings. In particular, because LiveRecorder normally only switches threads at the end of a BB, data races that occur in the middle of BBs will not be reproduced.

Thread Fuzzing is a configuration of LiveRecorder which varies the scheduling of threads, in order to increase the frequency of concurrency bugs, at the cost of a reduction in the speed of recording.

Advice when using Thread Fuzzing

Although Thread Fuzzing increases the frequency of concurrency bugs, they typically remain substantially rarer when the program is being recorded, compared to when it is run natively. A bug that occurs every time the program is run natively may be reproduced one time in ten or fewer under Thread Fuzzing, and rarer bugs may be reproduced proportionally more rarely under Thread Fuzzing.

This means that some degree of automation is advisable when attempting to capture a recording of a bug using Thread Fuzzing. One approach, if the program is suitable, is to run it repeatedly under live-record, using the --retry-for option to specify the duration to keep retrying, and the --save-on option to specify the circumstances under which a recording will be captured. For example, to re-run a program until it exits on a signal and a recording has been saved, or until 30 minutes have passed:

live-record \
    --recording-file recording.undo \
    --retry-for 30min \
    --save-on exit-signal \
    --thread-fuzzing \
    program-under-test

This example relies on the bug manifesting as an unhandled signal. If it manifests in some other way, the --save-on circumstances would need to be modified accordingly.

Configuring Thread Fuzzing

In the live-record tool

To enable Thread Fuzzing in live-record with default fuzzing modes, use the --thread-fuzzing option. This turns on the starve, random, and in-bb fuzzing modes.

To select specific fuzzing modes, don’t use the --thread-fuzzing option, but set the UNDO_tf environment variable to a comma-separated list of fuzzing modes as described below.

In the LiveRecorder API

In a program using the LiveRecorder API, include the undolr_thread_fuzzing.h header and call undolr_thread_mode_set(), passing a bitmask of fuzzing modes. See the header file for more detail.

Fuzzing modes

Thread starvation (starve)

One type of concurrency bug is related to the order in which threads execute. For example, when there’s a fast generator thread and a slow consumer thread, under normal circumstances the consumer never runs out of data, but if for some reason I/O is slow, the consumer may overtake the generator.

char *array[100] = { 0 };

void
generator_thread(void)
{
    for (int i = 0; i < 100; i++)
    {
        array[i] = strdup("Hello world\n");
    }
}

void
consumer_thread(void)
{
    for (int i = 0; i < 100; i++)
    {
        /* Bug: array[i] could be NULL if the consumer overtook the generator. */
        puts(array[i]);
    }
}

Thread starvation fuzzing mode attempts to provoke these thread-ordering bugs by randomly selecting threads and “starving” them, that is, avoiding scheduling them for short periods of time.

This mode is included in the default set of fuzzing modes selected by the live-record --thread-fuzzing option, or you can include starve in the UNDO_tf environment variable, or pass undolr_thread_mode_STARVE to undolr_thread_mode_set().

Randomising thread slices (random)

Normally LiveRecorder records each thread for a fixed number of BBs before switching to another thread.

Random fuzzing mode records each thread for a smaller, randomly chosen, number of BBs before switching, increasing the frequency of thread switches.

This mode is included in the default set of fuzzing modes selected by the live-record --thread-fuzzing option, or you can include random in the UNDO_tf environment variable, or pass undolr_thread_mode_RANDOM to undolr_thread_mode_set().

Switching inside BBs (in-bb)

Normally LiveRecorder executes the whole of a BB before considering switching threads. This means that data races that occur inside BBs cannot normally be reproduced. For example:

volatile int value1 = 0;
volatile int value2 = 0;

void
setter_thread(void)
{
    for (int i = 0; i < 100; i++)
    {
        value1 = i;
        /* Bug: value1 != value2 here. */
        value2 = i;
    }
}

void
checker_thread(void)
{
    for (int i = 0; i < 100; i++)
    {
        assert(value1 == value2);
    }
}

This code does not fail in LiveRecorder with default settings, as the assignments to value1 and value2 in the setter thread belong to the same BB, and so LiveRecorder never switches to the checker thread at the indicated point where the two variable have different values.

The in-BB fuzzing mode allows thread switches to happen after any instruction.

This mode is included in the default set of fuzzing modes selected by the live-record --thread-fuzzing option, or you can include in-bb in the UNDO_tf environment variable, or pass undolr_thread_mode_IN_BB to undolr_thread_mode_set().

Switches around locking/syncing instructions (sync-instr)

Synchronization primitives, including locks, mutexes, semaphores, and atomic built-in functions, are usually implemented using CPU instructions that are specialized for this purpose, for example, the LOCK CMPXCHG (atomic compare-and-exchange) instruction on x86 CPUs, or the LDADDAL (atomic load-add-and-store) instruction on ARM64 CPUs.

The sync-instruction fuzzing mode increases the frequency of thread switches immediately before and after these instructions. This increases the frequency of concurrency bugs related to the incorrect use of synchronization primitives.

This mode is not included in the default set of fuzzing modes selected by the live-record --thread-fuzzing option. You must include sync-instr in the UNDO_tf environment variable, or pass undolr_thread_mode_SYNC_INSTR to undolr_thread_mode_set().