Limitations¶
Debugging another user’s process¶
Sometimes it is necessary to debug a process belonging to another user. Usually, debuggers running with sufficient privileges can attach to another user’s processes. Under UDB this is not the case by default. With default settings, UDB may attach only to processes belonging to the same user as the debugger. It will refuse to attach to processes belonging to another user, even when the debugger is running with sufficient privileges.
Note
Usually, a user with the CAP_PTRACE
capability can attach to any other
process in the system (so long as it is not already being debugged). In most
configurations, this equates with the root
user. These privileges are
still required to attach UDB to another user’s process but are not, by
themselves, adequate.
This behavior can be overridden by activating permissive communications mode. This mode is disabled by default but can be enabled by setting an environment variable.
Warning
Permissive Communications Mode makes cross-user debugging possible but comes with security implications: it opens communications channels that any process on the system may connect to. It therefore carries a risk of local cross-user exploits.
It is therefore recommended that users avoid Permissive Communications Mode where possible. It is particularly desirable to avoid its use on shared systems. If permissive communications are specifically required (for instance, to fit within an existing debugging workflow on a dedicated system) they can be enabled via a environment variable:
UNDO_permissive_comms=true udb <args>
When starting with permissive comms mode, UDB will display a warning to remind the user of the setting currently in effect:
CAUTION: attaching with permissive comms mode
Program limitations¶
Programs using these features are not supported by the Undo Engine.
- CPU features (x86)
The x86-64 CPU features
3dnow
,3dnowext
,3dnowprefetch
,amx_bf16
,amx_int8
,amx_tile
,clwb
,hle
,mpx
,rdseed
, andrtm
are not supported.- x86 inter-segment (“far”) jumps/calls
“Far” jumps and calls are not supported.
- CPU features (ARM)
Mandatory features from 64-bit ARM architectures up to ARMv8.2 are supported. Features that are optional or from newer ARM architectures may work, but are not guaranteed. The optional Scalable Vector Extension (SVE) from ARMv8.2 is supported. The optional Pointer Authentication extension from ARMv8.3 is not supported.
- ARM-specific limitations
Hugetlbfs and Infiniband are not supported on ARM.
- Direct Memory Access
Direct hardware access to program memory may cause the program to behave differently or crash when run under the Undo Engine and recorded history may fail to replay correctly. For example, programs making use of GPU acceleration via the
/dev/kfd
driver are not supported.- Exec system call
The
execve
system call is not supported in record mode, and the program is stopped when it issues this system call.- Obsolete system calls
The
modify_ldt
,pivot_root
,ssetmask
,unshare
, andvm86
system calls are not supported. (These are esoteric or obsolete, and only maintained in the kernel to maintain backwards compatibility with binaries written for early 2.x-series kernels.)
Restartable sequences
glibc, from version 2.35 onwards, attempts to register a restartable sequence with the kernel. This glibc behavior can be disabled by adding
glibc.pthread.rseq=0
to theGLIBC_TUNABLES
environment variable of the target application. Therseq
(restartable sequence) system call as used by glibc is supported on 64 bit platforms. This means:
The Undo Engine can attach to running programs that have registered restartable sequence areas with the kernel provided:
The program is not making use of the critical section functionality restartable sequences provide.
The restartable sequence area is not in shared memory.
While recording, the
rseq
system call returns the expected code (for example, zero for success, or -EINVAL for a bad parameter), but a restartable sequence area is not actually registered with the kernel. If the call would have succeeded natively, the Undo Engine tracks the restartable sequence area in the same way as those present at attach time.While recording, any restartable sequence area has its
cpu_id
andcpu_id_start
set to −1 and 0 respectively.At detach, the restartable sequence state is restored, respecting any registrations and unregistrations of restartable sequence areas which were made while recording.
Applications which use restartable sequences other than implicitly via glibc are not supported.
- Other unsupported system calls
The
pivot_root
andreboot
system calls are not supported.- Self-modifying code
Modifying instructions and executing them immediately (without an intervening branch or system call) is not supported.
- Code in shared memory
Execution of code in shared memory is not supported.
- Cross-memory attach
Programs whose memory is updated via the
process_vm_writev
system call from another process are not supported.- setrlimit
If the program uses
setrlimit()
to reduce the amount of memory, processes, or other resources that it may consume, then the Undo Engine may not be able to operate properly due to lack of resources.- Use of libunwind
libunwind maps the executable and shared libraries multiple times. Under the Undo Engine, this consumes large amounts of event log, which may cause events of interest to be discarded from the recording.
If possible, use of libunwind should be disabled when recording: if needed, stack traces can be reproduced by replaying the recording. If disabling libunwind is not possible, please contact Undo Support.
- Address Sanitizer
Address Sanitizer (ASan) is a memory error detector for C and C++ whose implementation relies on the
libasan.so
shared library. Programs using Address Sanitizer must loadlibasan.so
before all other shared libraries. When theLD_PRELOAD
environment variable is used to load libraries (such as the Asynchronous I/O support library),libasan.so
must be specified as the first library to be loaded, for example,LD_PRELOAD=libasan.so:libundodb_aio_preload_x64.so
.The Leak Sanitizer (LSan) component of ASan does not work when recorded by the Undo Engine: it fails with the error “LeakSanitizer does not work under ptrace”. Although described as a “fatal error”, the only practical consequence is that Leak Sanitizer does not run and so memory leaks are not detected. The error message can be disabled by adding
detect_leaks=0
to theASAN_OPTIONS
environment variable.- Timing requirements
Programs run more slowly during recording, so that they may not meet their timing requirements. In particular, if there is a timing-sensitive mechanism for detecting failures, such as a watchdog timer or a heartbeat signal, then that mechanism might falsely conclude that the program had failed. In this case, the timer must be disabled, or the timeout extended.
Differences in behavior¶
Programs using these features may behave differently when run under the Undo Engine.
- Shared memory accesses straddling valid and invalid pages
When run natively, an instruction with an operand that straddles a page boundary, such that the first part of the operand is in accessible shared memory, but the second part is in mapped shared memory which is not backed by a valid shared object (for example, because the file which is mapped has been truncated) will get a
SIGBUS
. Under the Undo Engine, such an instruction does not get aSIGBUS
: instead, it reads zeros for the part of the operand in unbacked memory.- System call output buffers
When run natively, a system call with an operand that straddles a page boundary, such that the first part of the operand is in accessible memory, but the second part is in inaccessible memory, succeeds if the system call only needs to access the first part of the operand. For example, if a
read()
system call is passed an 8K buffer of which the first 4K is in writable memory, and the second 4K is in non-writable memory, then the system call succeeds if less than 4K is read. Under the Undo Engine, such a system call fails withEFAULT
. The whole buffer must be accessible in order for the system call to succeed.- Adjust Flag
According to the Intel manuals, the state of the Adjust Flag (AF) after some instructions is “undefined”. On some processor models, different executions of the same code can produce different states of AF. If the behavior of a program depends on the state of AF when it is undefined, the program may not replay correctly.
Debugger limitations¶
- Multiple processes
udb cannot record multiple processes simultaneously. However, it can replay multiple recordings using Multi-Process Correlation for Shared Memory.
- Event log size
The Undo Engine records to an in-memory event log, and so its size is limited by the available memory.
- User-defined command hooks
udb uses “user-defined command hooks” to hook many of GDB’s commands, so these hooks are not available to the user.
- Forked programs
When a recorded program executes the
clone
system call to create a new process (for example, using thefork()
orvfork()
C library functions), the Undo Engine keeps recording the parent process, but the child process runs unimpeded without being recorded.When using the LiveRecorder API, the child process must call
undolr_start()
to be recorded. When using the LiveRecorder tool, the--record-on program:PATTERN
option causes any descendant process whose name matchesPATTERN
to be recorded. UDB cannot record the child process automatically.SIGKILL
The Undo Engine cannot continue if the program is terminated by the signal
SIGKILL
. The debugging session is terminated immediately in this case.SIGCHLD
while attachingIf a
SIGCHLD
arrives for a process while the Undo Engine is in the middle of attaching to the process, theSIGCHLD
may be silently lost. Once the process has been attached,SIGCHLD
is handled normally.
Finding memory leaks¶
The Undo Engine cannot perform memory leak detection automatically, but it can assist in the process of searching for memory leaks.
In particular, the last command in UDB jumps to the last
time (or next time, if the -forward
option is
specified) when the value of an expression was modified, or to the point where
the memory for the expression was allocated (or freed, if -forward
). This can be used to find the code responsible for a leaked
allocation or a double free.
The Undo Engine also supports debugging code compiled with code sanitizers including the Address Sanitizer and Leak Sanitizer in Clang and GCC.
System limitations¶
- Memory
The system may run out of memory while the Undo Engine is recording, as indicated by the error
ENOMEM
appearing in output or log files. The memory requirements can be reduced by setting the maximum event log size, or setting the kernel overcommit mode to zero or one, for example usingsudo sysctl vm.overcommit_memory = 0
. See overcommit accounting in the Linux kernel documentation.- Disk space exhaustion
The system may run out of disk space while the Undo Engine is recording or replaying, as indicated by the error
ENOSPC
appearing in the output or log files.The directory used for temporary files can be configured using the
--tmpdir-root
command-line option to live-record, the--tmpdir-root
command-line option to udb, or theUNDO_tmpdir_root
environment variable.- Linux security modules
Some Linux security modules may prevent the Undo Engine from recording a program, or replaying a recording. In particular:
Yama can prevent the Undo Engine from attaching to a program, so it must be disabled using
sudo sysctl kernel.yama.ptrace_scope = 0
.AppArmor can prevent the Undo Engine from executing a program, if the program has a security profile. This is indicated by the error
EACCES
appearing in output or the log files. This can be worked around by deleting the security profile, or by copying the program to a new location where the security profile does not apply.SELinux can prevent the Undo Engine from recording a program. This is indicated by
EACCES
error appearing in the output or the log files. This needs to be addressed on a case-by-case basis: please contact Undo Support.
Virtual address space
Recordings can only be replayed on machines supporting virtual addresses of the same bit width or greater than the addresses which are mapped in the recording.