How it works¶
This page describes what Undo gives to an AI agent and how that translates into a debugging session. It is meant as background reading; you do not need to know any of this to use AI with Undo.
High-level architecture¶
The architecture below describes the case where Undo is used from your own AI agent:
The AI agent runs locally on your machine and talks to a model through whichever LLM provider it is already configured to use.
The agent starts undo mcp as a subprocess. That process is UDB running as an MCP server, with its tools exposed to the AI agent over the Model Context Protocol.
When the AI agent decides to investigate an Undo recording, it calls Undo’s tools. The MCP server drives the recording with the Undo Engine and returns results designed to be easy for the AI agent to consume.
When you use Undo in UDB via the ai command, the direction of control is reversed: UDB itself starts an AI agent and connects it to the same MCP server (over HTTP on a local port). The agent then drives the investigation in the same way as above, while UDB displays its progress and output.
In both cases, the model never sees raw recording data: it sees the responses of Undo’s tools, which are designed to summarize what the program did in a form an LLM can reason about.
Tools and skills: the kitchen and the recipes¶
To borrow Anthropic’s analogy: an MCP server provides the kitchen, with its tools and equipment, while skills provide the recipes that explain how to combine those tools to achieve specific outcomes.
Undo ships both. The kitchen is the set of tools listed under What Undo gives the
agent below. The recipes are skill documents the MCP server
makes available to the AI agent, covering things like “how to capture an Undo
recording of a flaky test” or “what to do when the user mentions a .undo file”. The
agent picks the relevant recipe when it sees a matching situation.
Because there is not yet a standard way for an AI agent to discover and install skills offered by an MCP server, Undo currently exposes each skill as a tool that the agent calls to retrieve the relevant guidance. This is an implementation detail and may change as the ecosystem matures.
What Undo gives the AI agent¶
The Undo MCP server exposes a focused set of capabilities, grouped below by the kind of investigation they support. These descriptions are deliberately brief; you do not normally call these tools yourself, and the exact tool names may change over time as we refine the surface.
Triage and orientation¶
Every investigation starts with a triage step that loads the recording and gives the AI agent an initial overview: which program ran, with which arguments, how long the execution history is and which functions were called along the way, with how many calls each. The summary is also what allows the AI agent to navigate the recording without relying on breakpoints (see What is intentionally not exposed below).
Capturing data with logging¶
LLMs are good at reading large logs. Undo gives the AI agent tools that retrospectively generate a detailed, log-like trace of what a chosen function did during a chosen call, or of every call made to a given function, including arguments, return values, branches taken and assignments to local variables.
Each entry in the trace carries the time in execution history at which it happened, so the AI agent can jump straight to an interesting point (for example a specific iteration of a loop, or the return from a specific call) without having to step there.
Unlike adding printf calls and recompiling, this kind of logging:
needs no source changes and no rebuild, because the data is reconstructed from the existing recording;
does not perturb timing or scheduling, so any race condition or timing-sensitive behavior captured in the recording is preserved;
can be added or removed retroactively, including in places you only became interested in after the program failed.
Time travel and execution¶
The agent can move through execution history: jump to a specific time, jump to the end of the recording, step backwards over a source line, step out of a function in reverse (similar to reverse-finish), or reverse-step into a function to see what it returned. These map to standard UDB time-travel commands, but their tool signatures and responses are designed for an LLM consumer rather than for a human at a prompt: some tools take an extra argument describing where the AI agent expects to end up, so the response can call out a mismatch directly, and others return a focused summary instead of raw debugger output.
Tracing where a value came from¶
A dedicated tool answers questions of the form “where did this value come from?” by finding the last time an expression’s value changed before the current point in execution history. The underlying mechanism is the same as the last command in UDB.
The agent can chain this: trace a wrong pointer back to the function that wrote it, then trace the input to that write, and so on, walking the chain of causality directly through the recording instead of guessing from the code.
Inspecting state¶
The agent can evaluate arbitrary expressions in the program at the current
time: read variables, follow pointers, format structures and inspect types. The
result is a combined value-and-type view, formatted so the AI agent can use it directly without
juggling separate print, ptype and whatis calls.
Memory¶
Two memory-oriented tools are exposed:
A memory map report summarizing where memory is mapped in the recorded process, including stacks and shared libraries.
A memory report that uses lifetime tracking to determine whether a pointer is valid at the current time, and, if applicable, when its underlying object was allocated and when it was freed. Lifetime tracking watches the program’s memory allocator throughout execution history so that the lifetime of any object can be reconstructed on demand. This is particularly useful for diagnosing use-after-free and double-free issues.
Bookmarks¶
The agent can set, list and jump to bookmarks. We expose bookmarks because they encourage the AI agent to explicitly name the points in execution history that matter to its reasoning. An agent that has carefully bookmarked the key moments of an investigation can recover smoothly after a misstep, and the bookmarks remain in the recording afterwards as evidence of the chain of reasoning that led to the agent’s conclusion. See also the pair-programming notes.
Self-review¶
Agents often stop investigating as soon as they find a plausible explanation, even when that explanation is only proximate and not the actual cause. Undo offers a self-review tool that spawns a fresh sub-agent to challenge the conclusions of the main investigation before they are reported back to the user. We have found this materially improves the quality of the final answer.
Test framework integration¶
For recordings of programs that use Google Test, Undo can list the tests that ran and jump to the start of a specific test, so the AI agent can scope its investigation to one failing test instead of the whole binary.
Generating recordings (skills)¶
If a program has not yet been recorded, Undo ships skills that teach the AI agent how to use
live-record to capture one, including how to handle
wrapper scripts, save only when something fails, and combine --retry-for with
Thread Fuzzing to catch flaky failures. Once a recording exists,
the AI agent can investigate it like any other.
What is intentionally not exposed¶
A few capabilities that UDB users might expect to find are deliberately not part of the MCP surface.
Breakpoints¶
The MCP server does not give the AI agent a way to set breakpoints. LLMs tend to set many breakpoints, lose track of them, and then get confused when one fires unexpectedly, without being able to recover the thread of their investigation. The triage summary combined with the logging tools described above give the AI agent a more reliable way to navigate execution history without that failure mode.
Forward execution¶
The MCP server does not expose forward stepping, continue or similar forward-only
operations. Most public training data describes traditional debuggers, where forward
execution is the only option, so an LLM offered both forward and reverse navigation
tends to overwhelmingly prefer forward navigation, which leads to worse analyses on a
time travel debugger such as UDB. Restricting the surface to backwards navigation,
bookmarks, logging and direct jumps in execution history
nudges the AI agent into using the time travel capabilities of Undo where they actually
help.
The agent is not stuck moving only backwards: it can jump to a bookmark, to a time it saw in a log entry, or to the end of the recording, and continue its investigation from there.
Limitations¶
Some features of the Undo Suite are not available when AI is used. See the AI support section of the Limitations page for details.