Limitations

Debugging another user’s process

Sometimes it is necessary to debug a process belonging to another user. Usually, debuggers running with sufficient privileges can attach to another user’s processes. Under UDB this is not the case by default. With default settings, UDB may attach only to processes belonging to the same user as the debugger. It will refuse to attach to processes belonging to another user, even when the debugger is running with sufficient privileges.

Note

Usually, a user with the CAP_PTRACE capability can attach to any other process in the system (so long as it is not already being debugged). In most configurations, this equates with the root user. These privileges are still required to attach UDB to another user’s process but are not, by themselves, adequate.

This behaviour can be overridden by activating permissive communications mode. This mode is disabled by default but can be enabled by setting an environment variable.

Warning

Permissive Communications Mode makes cross-user debugging possible but comes with security implications: it opens communications channels that any process on the system may connect to. It therefore carries a risk of local cross-user exploits.

It is therefore recommended that users avoid Permissive Communications Mode where possible. It is particularly desirable to avoid its use on shared systems. If permissive communications are specifically required (for instance, to fit within an existing debugging workflow on a dedicated system) they can be enabled via a environment variable:

UNDO_permissive_comms=true udb <args>

When starting with permissive comms mode, UDB will display a warning to remind the user of the setting currently in effect:

CAUTION: attaching with permissive comms mode

Program limitations

Programs using these features are not supported by the Undo Engine.

CPU features (x86)

The x86-64 CPU features 3dnow, 3dnowext, 3dnowprefetch, amx_bf16, amx_int8, amx_tile, clwb, hle, mpx, rdseed, and rtm are not supported.

x86 inter-segment (“far”) jumps/calls

“Far” jumps and calls are not supported.

CPU features (ARM)

The ARMv8.2 architecture is supported. Scalable Vector Extensions (SVE) and pointer authentication are not supported.

ARM-specific limitations

Hugetlbfs and Infiniband are not supported on ARM.

Direct Memory Access

Direct hardware access to program memory may cause the program to behave differently or crash when run under the Undo Engine and recorded history may fail to replay correctly. For example, programs making use of GPU acceleration via the /dev/kfd driver are not supported.

Exec system call

The execve system call is not supported in record mode, and the program is stopped when it issues this system call.

Obsolete system calls

The modify_ldt, pivot_root, ssetmask, unshare, and vm86 system calls are not supported. (These are esoteric or obsolete, and only maintained in the kernel to maintain backwards compatibility with binaries written for early 2.x-series kernels.)

Restartable Sequences

glibc, from version 2.35 onwards, attempts to register a restartable sequence with the kernel. This glibc behaviour can be disabled by adding glibc.pthread.rseq=0 to the GLIBC_TUNABLES environment variable of the target application. The rseq (restartable sequence) system call as used by glibc is supported on 64 bit platforms. This means:

  • The Undo Engine can attach to running programs that have registered restartable sequence areas with the kernel provided:

    • The program is not making use of the critical section functionality restartable sequences provide.

    • The restartable sequence area is not in shared memory.

  • During record the engine calls to rseq will return the expected native return code (e.g. 0 for success, -EINVAL for a bad parameter) but a restartable sequence area will not actually be registered with the kernel. If the call would have succeeded natively the engine will track the specified restartable sequence area in the same way as those present at attach time.

  • During record any restartable sequence area will have its cpu_id and cpu_id_start set to -1 and 0 respectively.

  • At detach restartable sequence state is restored, respecting any registrations and unregistrations of restartable sequence areas which were made during recording.

Applications which use restartable sequences other than implicitly via glibc are not supported.

Other unsupported system calls

The pivot_root and reboot system calls are not supported.

Self-modifying code

Modifying instructions and executing them immediately (without an intervening branch or system call) is not supported.

Code in shared memory

Execution of code in shared memory is not supported.

Cross-memory attach

Programs whose memory is updated via the process_vm_writev system call from another process are not supported.

setrlimit

If the program uses setrlimit() to reduce the amount of memory, processes, or other resources that it may consume, then the Undo Engine may not be able to operate properly due to lack of resources.

Use of libunwind

libunwind maps the executable and shared libraries multiple times. Under LiveRecorder, this consumes large amounts of event log, which may cause events of interest to be discarded from the recording.

If possible, use of libunwind should be disabled when recording: if needed, stack traces can be reproduced by replaying the recording. If disabling libunwind is not possible, please contact Undo Support.

Address Sanitizer

Address Sanitizer (ASan) is a memory error detector for C and C++ whose implementation relies on the libasan.so shared library. Programs using Address Sanitizer must load libasan.so before all other shared libraries. When the LD_PRELOAD environment variable is used to load libraries (such as the Asynchronous I/O support library), libasan.so must be specified as the first library to be loaded, for example, LD_PRELOAD=libasan.so:libundodb_aio_preload_x64.so.

The Leak Sanitizer (LSan) component of ASan does not work when recorded by the Undo Engine: it fails with the error “LeakSanitizer does not work under ptrace”. Although described as a “fatal error”, the only practical consequence is that Leak Sanitizer does not run and so memory leaks are not detected. The error message can be disabled by setting the environment variable ASAN_OPTIONS=detect_leaks=0.

Timing Requirements

Programs run more slowly during recording, so that they may not meet their timing requirements. In particular, if there is a timing-sensitive mechanism for detecting failures, such as a watchdog timer or a heartbeat signal, then that mechanism might falsely conclude that the program had failed. In this case, the timer must be disabled, or the timeout extended.

Differences in behaviour

Programs using these features may behave differently when run under the Undo Engine.

Shared memory accesses straddling valid and invalid pages

When run natively, an instruction with an operand that straddles a page boundary, such that the first part of the operand is in accessible shared memory, but the second part is in mapped shared memory which is not backed by a valid shared object (for example, because the file which is mapped has been truncated) will get a SIGBUS. Under the Undo Engine, such an instruction does not get a SIGBUS: instead, it reads zeros for the part of the operand in unbacked memory.

System call output buffers

When run natively, a system call with an operand that straddles a page boundary, such that the first part of the operand is in accessible memory, but the second part is in inaccessible memory, succeeds if the system call only needs to access the first part of the operand. For example, if a read() system call is passed an 8K buffer of which the first 4K is in writable memory, and the second 4K is in non-writable memory, then the system call succeeds if less than 4K is read. Under the Undo Engine, such a system call fails with EFAULT. The whole buffer must be accessible in order for the system call to succeed.

Adjust Flag

According to the Intel manuals, the state of the Adjust Flag (AF) after some instructions is “undefined”. On some processor models, different executions of the same code can produce different states of AF. If the behavior of a program depends on the state of AF when it is undefined, the program may not replay correctly.

Debugger limitations

Multiple processes

udb cannot record multiple processes simultaneously. However, it can replay multiple recordings using Multi-Process Correlation for Shared Memory.

Event log size

The Undo Engine records to an in-memory event log, and so its size is limited by the available memory.

User-defined command hooks

udb uses “user-defined command hooks” to hook many of GDB’s commands, so these hooks are not available to the user.

Forked programs

When a recorded program executes the clone system call to create a new process (for example, using the fork() or vfork() C library functions), the Undo Engine keeps recording the parent process, but the child process runs unimpeded without being recorded.

This means that, when using the LiveRecorder API, the child (forked) process must call undolr_start() to be recorded. live-record and UDB cannot record the child process automatically.

SIGKILL

The Undo Engine cannot continue if the program is terminated by the signal SIGKILL. The debugging session is terminated immediately in this case.

SIGCHLD while attaching

If a SIGCHLD arrives for a process while the Undo Engine is in the middle of attaching to the process, the SIGCHLD may be silently lost. Once the process has been attached, SIGCHLD is handled normally.

Finding memory leaks

The Undo Engine cannot perform memory leak detection automatically, but it can assist in the process of searching for memory leaks.

For example, you could set a watchpoint in the malloc_chunk structure describing an allocated block, and run backwards to find the point where that data structure was modified, and so discover which code allocated the block.

The Undo Engine also supports debugging code compiled with Clang’s sanitizers such as AddressSanitizer or LeakSanitizer.

System Limitations

Memory

The system may run out of memory while recording, as indicated by the error ENOMEM appearing in LiveRecorder's output or log files. The memory requirements for LiveRecorder can be reduced by setting the maximum event log size, or setting the kernel overcommit mode to zero or one, for example using sudo sysctl vm.overcommit_memory = 0. See overcommit accounting in the Linux kernel documentation.

Disk space exhaustion

The system may run out of disk space while LiveRecorder is recording or UDB is replaying, as indicated by the error ENOSPC appearing in the output or log files.

The directory used for temporary files can be configured using the --tmpdir-root command-line option to live-record, the --tmpdir-root command-line option to udb, or the UNDO_tmpdir_root environment variable.

Linux security modules

Some Linux security modules may prevent LiveRecorder from recording a program, or UDB from replaying a recording. In particular:

  • Yama can prevent LiveRecorder or UDB from attaching to a program, so it must be disabled using sudo sysctl kernel.yama.ptrace_scope = 0.

  • AppArmor can prevent LiveRecorder or UDB from executing a program, if the program has a security profile. This is indicated by the error EACCES appearing in output or the log files. This can be worked around by deleting the security profile, or by copying the program to a new location where the security profile does not apply.

  • SELinux can prevent LiveRecorder from recording a program. This is indicated by EACCES error appearing in the output or the log files. This needs to be addressed on a case-by-case basis: please contact Undo Support.

Virtual address space

Recordings can only be replayed on machines supporting virtual addresses of the same bit width or greater than the addresses which are mapped in the recording.