Technical details¶
The Undo Engine records only non-deterministic data, which is sufficient for it to be able to recreate the program’s entire memory and registers on demand for any point in its execution.
To do this, it performs a JIT (just-in-time) binary translation of the machine code as it executes, in order that all sources of non-determinism can be captured.
For each non-deterministic operation, its results are recorded in the event log, which is stored in the memory of the debugged application process. In most programs these non-deterministic operations represent a tiny fraction of the instructions executed so the event log can be very efficient. Snapshots of the program also stored (but using copy-on-write so it’s also efficient). In this way, it is possible to replay a session precisely by restoring the program starting state and running it forwards, but re-executing only the deterministic operations; all non-deterministic operations are synthesized based on what is stored in the event log.
Recording non-deterministic events¶
Asynchronous signals are intercepted and recorded at userspace using a combination of an interceptor signal handler and the standard kernel ptrace mechanism. Thread switches are handled using a patented design and implementation based on our instrumentation technology and standard kernel calls. For non-deterministic instructions our JIT engine translates these specially and records any non-deterministic side-effects.
RAM and disk usage¶
The Undo Engine has been designed in a way that avoids the need to store an excessive amount of data in order to reconstruct program execution. It uses various techniques involving, for example, intelligent distribution of process snapshots through the history of the recording and storage of only the nondeterminstic events that cannot be reconstructed, which ensure that the required state is kept to a minimum. Replaying the recording then requires us simply to re-run from appropriate snapshots, substituting stored events on-the-fly.
Replay¶
When the Undo Engine replays the execution history of a recorded process, it chooses an appropriate process snapshot and executes it, replacing any non-deterministic events with recorded data from the event log. The implications of this are as follows:
It doesn’t modify any system state outside of the memory of the debugged process.
It doesn’t replay at the original speed, since there is a slight overhead in substituting the events from the event log.
Currently it only allows replayed snapshots to read from the event log, and it prohibits them from generating their own events and creating an “alternative version of history”. If it allowed this, the internal program state would become inconsistent with the external state (for example, in a TCP networking scenario, new packets would need to be sent which the TCP peer would not expect).
Aside from these limitations, the behavior of the debugged process is “as if” the system state has been rolled back and re-executed from that point.
Recording portability¶
Undo recordings generated on one system can be replayed on other systems subject to the conditions detailed below.
See also the replay service documentation for a solution to replaying incompatible recordings; for instance, if developers use machines with Intel or AMD x86-64 CPUs but need to load recordings produced on ARM64 machines.
Version compatibility¶
The Undo Engine provides backwards compatibility. That is, recordings made by an older release of the Undo Engine on a Linux distribution supported by that older release are replayable by the same or a later release of the Undo Engine on a Linux distribution supported by that later release.
The Undo Engine does not guarantee forwards compatibility.
Cross-distribution compatibility¶
Recordings made on a supported Linux distribution are replayable on any other supported Linux distribution (but see the note in Virtual addressing compatibility about page sizes on ARMv8).
Containers and virtual machines¶
Recordings made within a container or VM are replayable on a physical machine, and vice versa.
Microarchitecture compatibility¶
Recordings made on one CPU are replayable on any other CPU that has the same or has a superset of the instruction set extensions that are present on the CPU on which the recording was made.
Recordings made on one CPU are also replayable on any other CPU that has a subset of the instruction set extensions present on the CPU on which the recording was made providing that:
The application is built using GCC or Clang compiler flags suitable for the microarchitecture of the CPU on which the recording is replayed.
The application does not use hand-coded or emit dynamically generated code that uses instruction set extensions not present on the CPU on which the recording is replayed.
For x86 recordings, an additional requirement applies:
If the recording was made on a machine which both supports the AVX-512 family of instructions, and uses an operating system which provides GNU C Library (glibc) version 2.34 or later, then such recordings are only replayable if at least one of the following conditions is met:
the recording is being replayed on a machine which also supports AVX-512;
the recording was made using version 6.12 or later of the Undo Engine, and AVX-512 is only used inside the functions declared in the
<string.h>
header; orAVX-512 instructions were disabled in the recorded program.
See the AVX-512 recording portability page for more information about AVX-512 support.
Virtual addressing compatibility¶
Intel 5-level paging and ARMv8.2 Large Virtual Addressing allow the use of a larger virtual address space in applications by increasing the number of bits used to address virtual memory.
The implication of these technologies on the portability of Undo recordings is that recordings must be replayed on a system using virtual addresses at least as large as those that were used when the recording was made.
So, using Intel 5-level paging as an example:
Recordings made on a system which supports 4-level paging are replayable on any other supported system.
Recordings of applications that exclusively use 48-bit virtual addresses on a system which supports 5-level paging are replayable on any other supported system.
Recordings of applications that map memory regions into the extended virtual address space (bits 48 to 56) on a system using 5-level paging are only replayable on other systems using 5-level paging.
On ARMv8, in addition, recordings are only replayable on systems where the memory page size matches that of the system where the recording was made.
Multi-threaded applications¶
Threads are tasks that execute concurrently within a shared address space. The interaction of threads is often non-deterministic and this is a common source of bugs.
The Undo Engine supports concurrent threads, and therefore programs can use all normal threading capabilities made available by the system. However in order to achieve deterministic record and replay, the Undo Engine serializes the execution of threads, as if they were running on a uniprocessor CPU.
The Undo Engine allows each thread to run independently, but imposes a global mutex such that only a single thread at a time can execute. Thread preemption is handled by the kernel as normal, with the proviso that thread switches are permitted only after certain intervals. In this way, it remains possible for the Undo Engine to solve many types of race condition.
If there are synchronization problems in the original process being recorded, these will also be present when replaying the recording of that process. Likewise, if there are no synchronization problems, they will not be present when replaying the recording. In other words, the Undo Engine doesn’t introduce any synchronization problems, but it may help to expose existing synchronization problems in your application.
Source code and debug symbols¶
The Undo Engine does not require source code or DWARF debugging information to be available when recording a program. Source code and debugging information can be used when replaying, if it is available, even if it was not available during recording.
Parallel Search¶
When you execute a reverse-execution command in UDB, the Undo Engine carries out a search of execution history to find the point where the command ends up, taking into account breakpoints, watchpoints, and signals.
The work of searching execution history is normally parallelized across multiple logical cores. Up to four logical cores are used when searching the history of a 64-bit process, or up to two logical cores for a 32-bit process.
Note
Parallel Search is a CPU-intensive activity, so will perform better on a machine with four physical cores available compared to a machine with four hyperthreaded logical cores.
Configuring Parallel Search
Parallel Search is enabled by default. To disable it, set the environment variable
UNDO_parallel_search=false
before starting UDB. This restricts searches to a single core.
Performance
Parallel Search uses multiple processes to replay parts of execution history, with runtime behavior similar to a parallel build, and this requires additional memory and CPU consumption that may impact other processes on the machine. If this causes difficulties, Parallel Search can be disabled as described above.
System requirements
Parallel Search requires memory overcommit to be enabled, that is,
/proc/sys/vm/overcommit_memory
must be set to0
(default on most Linux distributions) or1
. When memory overcommit is not available, search is restricted to a single core.
Network traffic¶
The LiveRecorder library and the live-record tool make no network requests.
UDB makes the following network requests:
When UDB starts, it makes an HTTPS request to download.undo.io to determine if a newer release of the product is available. If this fails, the debugging session proceeds as normal. This request can be disabled: see Checking for updates.
In accordance with Undo’s privacy policy, when UDB starts or exits it attempts to send usage information via HTTPS to api.undo.io. If this fails, the debugging session proceeds as normal. Sharing of usage statistics may be enabled or disabled at any time using the set share-usage-statistics licensing-only|anonymized|on command.
When using a license that is configured to use a Keyserver, UDB contacts the Keyserver regularly. If this fails, the debugging session is terminated. The Keyserver is hosted within the customer’s network and does not make any outgoing network requests.