Thread Fuzzing

Thread Fuzzing allows Live Recorder to interfere with the regular scheduling of threads, in order to expose concurrency bugs more easily.

As Thread Fuzzing changes the scheduling of threads, some concurrency bugs which are very rare in normal conditions become statistically more common. To get the most from Thread Fuzzing, tests which may expose concurrency bugs should be run many times until a failure happens and then the recording can be analysed to discover the root cause.

There are several components in Thread Fuzzing: thread starvation, random thread slices, switching inside basic blocks, and switching at locking/syncing instructions. More detail on each component is below.

Thread Fuzzing requires an x86-64 CPU.

Configuring Thread Fuzzing in the live-record tool

To enable Thread Fuzzing with the default options use live-record --thread-fuzzing.

To enable only selected components, instead of using --thread-fuzzing use the UNDO_tf environment variable. The value of UNDO_tf is a comma-separated list of components. The recommended settings for most use cases (which is equivalent to the command line switch) is UNDO_tf=starve,random,in-bb. To enable all the components use UNDO_tf=starve,random,in-bb,sync-instr.

Configuring Thread Fuzzing in the Live Recorder API

To enable Thread Fuzzing, include undolr_thread_fuzzing.h and call undolr_thread_mode_set() with a bitmask of the desired components to enable. See the header file for more detail.

Thread starvation (starve)

A common type of concurrency bugs is due to ordering problems, for instance when there’s a fast generator thread and a slow consumer thread. The consumer thread, being slower, tends to always have data to consume so noticing bugs is rare, but, for instance due to slow I/O, the consumer thread may overtake the generator one.

char* array[100] = {0};

void generator_thread() {
    for (int i = 0; i < 100; i++) {
        array[i] = strdup("Hello world\n");
    }
}

void consumer_thread() {
    for (int i = 0; i < 100; i++) {
        // Error: the consumer can overtake the generator
        // and call puts() on NULL!
        puts(array[i]);
    }
}

Thread Fuzzing’s starve mode attempts to encourage race conditions by randomly picking some threads and preventing them from making progress for a short period of time.

Randomising thread slices (random)

Normally, Live Recorder lets a thread run for a fixed amount of basic blocks before letting the kernel switch to another thread.

Random mode makes the length of these runs random and often much shorter to increase the number of thread switches.

Switching inside basic blocks (in-bb)

Live Recorder, by default, doesn’t allow basic blocks to be interrupted. Thread switches can only happen at basic blocks boundaries, for instance when the recorded code has a jump instruction.

This may hide bugs happening due to having an inconsistent status for a very short amount of time, for instance:

volatile int value1 = 0;
volatile int value2 = 0;

void setter_thread() {
    for (int i = 0; i < 100; i++) {
        value1 = i;
        value2 = i;
    }
}

void checker_thread() {
    for (int i = 0; i < 100; i++) {
        assert(value1 == value2);
    }
}

The above code would never fail in Live Recorder with default settings as the setter thread can never be interrupted between the two assignments.

The in-bb setting allows thread switches to happen anywhere.

Switches around locking/syncing instructions (sync-instr)

Basic locking functionalities and atomic operations, for instance gcc’s _sync_* functions or pthread mutexes, are implemented using some machine instructions which are mainly used in this context (for instance Intel’s cmpxchg).

By allowing extra thread switches around these instructions, we can make it more likely that another thread, where locking is not done correctly, will be run at this point, exposing a concurrency bug.