Thread Fuzzing¶
When recording a multi-threaded program, LiveRecorder allows one thread to run at a time, to protect the integrity of its data structures, and to simplify the capturing of the behavior of the program. LiveRecorder switches between program threads when one thread makes a blocking system call, or when enough BBs have been recorded.
One consequence of this is that data races and other concurrency bugs occur with much lower frequency when a program is being recorded, and some data races may be impossible to reproduce with default settings. In particular, because LiveRecorder normally only switches threads at the end of a BB, data races that occur in the middle of BBs will not be reproduced.
Thread Fuzzing is a configuration of LiveRecorder which varies the scheduling of threads, in order to increase the frequency of concurrency bugs, at the cost of a reduction in the speed of recording.
Advice when using Thread Fuzzing¶
Although Thread Fuzzing increases the frequency of concurrency bugs, they typically remain substantially rarer when the program is being recorded, compared to when it is run natively. A bug that occurs every time the program is run natively may be reproduced one time in ten or fewer under Thread Fuzzing, and rarer bugs may be reproduced proportionally more rarely under Thread Fuzzing.
This means that some degree of automation is advisable when attempting to
capture a recording of a bug using Thread Fuzzing. One approach, if the program
is suitable, is to run it repeatedly under live-record, using the
--retry-for
option to specify the duration
to keep retrying, and the --save-on
option to
specify the circumstances under which a recording will be captured. For example,
to re-run a program until it exits on a signal and a recording has been saved,
or until 30 minutes have passed:
live-record \
--recording-file recording.undo \
--retry-for 30min \
--save-on exit-signal \
--thread-fuzzing \
program-under-test
This example relies on the bug manifesting as an unhandled signal. If it
manifests in some other way, the --save-on
circumstances would need to be modified accordingly.
Configuring Thread Fuzzing¶
In the live-record tool¶
To enable Thread Fuzzing in live-record with default fuzzing modes,
use the --thread-fuzzing
option. This
turns on the starve
, random
, and in-bb
fuzzing modes.
To select specific fuzzing modes, don’t use the --thread-fuzzing
option, but set the UNDO_tf
environment variable to a comma-separated list of fuzzing modes as described
below.
In the LiveRecorder API¶
In a program using the LiveRecorder API,
include the undolr_thread_fuzzing.h
header and call
undolr_thread_mode_set()
, passing a bitmask of fuzzing modes. See the
header file for more detail.
Fuzzing modes¶
Thread starvation (starve
)¶
One type of concurrency bug is related to the order in which threads execute. For example, when there’s a fast generator thread and a slow consumer thread, under normal circumstances the consumer never runs out of data, but if for some reason I/O is slow, the consumer may overtake the generator.
char *array[100] = { 0 };
void
generator_thread(void)
{
for (int i = 0; i < 100; i++)
{
array[i] = strdup("Hello world\n");
}
}
void
consumer_thread(void)
{
for (int i = 0; i < 100; i++)
{
/* Bug: array[i] could be NULL if the consumer overtook the generator. */
puts(array[i]);
}
}
Thread starvation fuzzing mode attempts to provoke these thread-ordering bugs by randomly selecting threads and “starving” them, that is, avoiding scheduling them for short periods of time.
This mode is included in the default set of fuzzing modes selected by the
live-record --thread-fuzzing
option, or you can include starve
in
the UNDO_tf
environment variable, or pass
undolr_thread_mode_STARVE
to undolr_thread_mode_set()
.
Randomising thread slices (random
)¶
Normally LiveRecorder records each thread for a fixed number of BBs before switching to another thread.
Random fuzzing mode records each thread for a smaller, randomly chosen, number of BBs before switching, increasing the frequency of thread switches.
This mode is included in the default set of fuzzing modes selected by the
live-record --thread-fuzzing
option, or you can include random
in
the UNDO_tf
environment variable, or pass
undolr_thread_mode_RANDOM
to undolr_thread_mode_set()
.
Switching inside BBs (in-bb
)¶
Normally LiveRecorder executes the whole of a BB before considering switching threads. This means that data races that occur inside BBs cannot normally be reproduced. For example:
volatile int value1 = 0;
volatile int value2 = 0;
void
setter_thread(void)
{
for (int i = 0; i < 100; i++)
{
value1 = i;
/* Bug: value1 != value2 here. */
value2 = i;
}
}
void
checker_thread(void)
{
for (int i = 0; i < 100; i++)
{
assert(value1 == value2);
}
}
This code does not fail in LiveRecorder with default settings, as the
assignments to value1
and value2
in the setter thread belong
to the same BB, and so LiveRecorder never switches to the checker thread at the
indicated point where the two variable have different values.
The in-BB fuzzing mode allows thread switches to happen after any instruction.
This mode is included in the default set of fuzzing modes selected by the
live-record --thread-fuzzing
option, or you can include in-bb
in
the UNDO_tf
environment variable, or pass
undolr_thread_mode_IN_BB
to undolr_thread_mode_set()
.
Switches around locking/syncing instructions (sync-instr
)¶
Synchronization primitives, including locks, mutexes, semaphores, and atomic
built-in functions, are usually implemented using CPU instructions that are
specialized for this purpose, for example, the LOCK CMPXCHG
(atomic
compare-and-exchange) instruction on x86 CPUs, or the LDADDAL
(atomic
load-add-and-store) instruction on ARM64 CPUs.
The sync-instruction fuzzing mode increases the frequency of thread switches immediately before and after these instructions. This increases the frequency of concurrency bugs related to the incorrect use of synchronization primitives.
This mode is not included in the default set of fuzzing modes selected by the
live-record --thread-fuzzing
option. You must include sync-instr
in the UNDO_tf
environment variable, or pass
undolr_thread_mode_SYNC_INSTR
to undolr_thread_mode_set()
.
Feedback-directed Thread Fuzzing¶
The Thread Fuzzing modes described above may be unable to reproduce data races that are too rare to be found by random thread switches, and are neither caused by thread starvation nor associated with synchronization primitives. In these cases, feedback-directed Thread Fuzzing may be effective. This takes a recording of the program and analyzes it to identify instructions that access the same data structures from different threads. The results of this analysis can then be used when re-recording the program to switch threads more often around these instructions.
To use feedback-directed Thread Fuzzing, first make a reference
recording of the program, with address-space-layout randomization
(ASLR) disabled, for example, using the live-record
--disable-aslr
option:
live-record \
--recording-file reference.undo \
--disable-aslr \
program-under-test
The reference recording needs to cover the code that is suspected of causing the data race, but does not need to reproduce the race.
Second, re-run live-record, passing the reference recording
to the --thread-fuzzing-analyze
option, for example:
live-record \
--thread-fuzzing-analyze reference.undo \
--recording-file fuzzed.undo \
program-under-test
This analyzes the reference recording as described above, and uses the
results of the analysis to guide the selection of thread switches when
re-recording the program. The results of the analysis are saved in a
file named after the reference recording, with .analysis
appended,
or if this file already exists, it is reused without needing to redo
the analysis.
Warning
Feedback-directed Thread Fuzzing is an experimental feature and may be changed or removed without notice in future releases. It does not work with 32-bit programs.
New in version 8.0.