The Undo Engine records only non-deterministic data, which is sufficient for it to be able to recreate the program’s entire memory and registers on demand for any point in its execution.
To do this, it performs a JIT (just-in-time) binary translation of the machine code as it executes, in order that all sources of non-determinism can be captured.
For each non-deterministic operation, its results are recorded in the event log, which is stored in the memory of the debugged application process. In most programs these non-deterministic operations represent a tiny fraction of the instructions executed so the event log can be very efficient. Snapshots of the program also stored (but using copy-on-write so it’s also efficient). In this way, it is possible to replay a session precisely by restoring the program starting state and running it forwards, but re-executing only the deterministic operations; all non-deterministic operations are synthesised based on what is stored in the event log.
Recording non-deterministic events¶
Asynchronous signals are intercepted and recorded at userspace using a combination of an interceptor signal handler and the standard kernel ptrace mechanism. Thread switches are handled using a patented design and implementation based on our instrumentation technology and standard kernel calls. For non-deterministic instructions our JIT engine translates these specially and records any non-deterministic side-effects.
RAM and disk usage¶
The Undo Engine has been designed in a way that avoids the need to store an excessive amount of data in order to reconstruct program execution. It uses various techniques involving, for example, intelligent distribution of process snapshots through the history of the recording and storage of only the nondeterminstic events that cannot be reconstructed, which ensure that the required state is kept to a minimum. Replaying the recording then requires us simply to re-run from appropriate snapshots, substituting stored events on-the-fly.
When the Undo Engine replays the execution history of a recorded process, it chooses an appropriate process snapshot and executes it, replacing any non-deterministic events with recorded data from the event log. The implications of this are as follows:
It doesn’t modify any system state outside of the memory of the debugged process.
It doesn’t replay at the original speed, since there is a slight overhead in substituting the events from the event log.
Currently it only allows replayed snapshots to read from the event log, and it prohibits them from generating their own events and creating an “alternative version of history”. If it allowed this, the internal program state would become inconsistent with the external state (for example, in a TCP networking scenario, new packets would need to be sent which the TCP peer would not expect).
Aside from these limitations, the behaviour of the debugged process is “as if” the system state has been rolled back and re-executed from that point.
The Undo Engine provides backwards compatibility. That is, recordings made by an older release of the Undo Engine on a Linux distribution supported by that older release are replayable by the same or a later release of the Undo Engine on a Linux distribution supported by that later release.
The Undo Engine does not guarantee forwards compatibility.
Recordings made on a supported Linux distribution are replayable on any other supported Linux distribution.
Containers and Virtual Machines
Recordings made within a container or VM are replayable on a physical machine, and vice versa.
Recordings made by release 6.0 or later of the Undo Engine on one CPU are replayable on any other CPU that has the same or has a superset of the instruction set extensions that are present on the CPU on which the recording was made.
Recordings made by release 6.0 or later of the Undo Engine on one CPU are also replayable on any other CPU that has a subset of the instruction set extensions present on the CPU on which the recording was made providing that:
The recording is replayed on a CPU with a “Haswell” or later microarchitecture.
The application does not use hand-coded or emit dynamically generated code that uses instruction set extensions not present on the CPU on which the recording is replayed.
Threads are tasks that execute concurrently within a shared address space. The interaction of threads is often non-deterministic and this is a common source of bugs.
The Undo Engine supports concurrent threads, and therefore programs can use all normal threading capabilities made available by the system. However in order to achieve deterministic record and replay, the Undo Engine serialises the execution of threads, as if they were running on a uniprocessor CPU.
The Undo Engine allows each thread to run independently, but imposes a global mutex such that only a single thread at a time can execute. Thread preemption is handled by the kernel as normal, with the proviso that thread switches are permitted only after certain intervals. In this way, it remains possible for the Undo Engine to solve many types of race condition.
If there are synchronization problems in the original process being recorded, these will also be present when replaying the recording of that process. Likewise, if there are no synchronization problems, they will not be present when replaying the recording. In other words, the Undo Engine doesn’t introduce any synchronization problems, but it may help to expose existing synchronization problems in your application.
Source code and debug symbols¶
The Undo Engine does not require source code or DWARF debugging information to be available when recording a program. Source code and debugging information can be used when replaying, if it is available, even if it was not available during recording.
When you execute a reverse-execution command in UDB, the Undo Engine carries out a search of execution history to find the point where the command ends up, taking into account breakpoints, watchpoints, and signals.
The work of searching execution history is normally parallelized across multiple CPU cores. Up to four cores are used when searching the history of a 64-bit process, or up to two cores for a 32-bit process.
Configuring Parallel Search
Parallel Search is enabled by default. To disable it, set the environment variable
UNDO_parallel_search=falsebefore starting UDB. This restricts searches to a single core.
Parallel Search uses multiple processes to replay parts of execution history, with runtime behaviour similar to a parallel build, and this requires additional memory and CPU consumption that may impact other processes on the machine. If this causes difficulties, Parallel Search can be disabled as described above.
Parallel Search requires memory overcommit to be enabled, that is,
/proc/sys/vm/overcommit_memorymust be set to
0(default on most Linux distributions) or
1. When memory overcommit is not available, search is restricted to a single core.