Snapshots

A snapshot captures the state of a program at a particular time in its execution history.

Reverse execution is implemented by replaying a program’s execution from previously captured snapshots. Since replay is guaranteed to be deterministic (due to the event log), when a snapshot is played forwards it will eventually reach the next snapshot.

In order to minimize overhead, snapshots are created by forking the original process. This means they benefit from Linux’s copy-on-write memory semantics, so that that the memory cost of a snapshot is proportional to the amount of memory that was subsequently modified by the program.

_images/snapshots.svg

Viewing snapshot information

List the snapshots in UDB using the info snapshots command.

info snapshots

Table of snapshots, ordered by recording time.

The columns are as follows:

  • number: The snapshot number.

  • recording time: The time in execution history represented by the snapshot.

  • pid: The process ID of the snapshot.

  • memory: The memory used by the snapshot. This is the “proportional set size” (PSS), which takes the copy-on-write memory into account by allocating it proportionally among the processes that share it. (Note that tools such as ps and top typically show the “resident set size” (RSS) which is much larger for snapshot processes as it does not take into account the sharing of copy-on-write memory.)

  • created: The wall-clock time at which the snapshot was created.

One of the snapshots has an arrow pointing to it, and this is the snapshot that is currently being used for recording or replay. For example:

24% 24,596> info snapshots
 number                        recording time      pid   memory   created
      0                  1:0x00007ffff7fe4320  3747615  802.00K  14:38:19
      1              8,004:0x00007ffff7fd6b58  3747621  894.00K  14:38:20
=>    2             24,596:0x0000555555555374  3747619    1.54M  14:38:20
      3             65,537:0x00005555555552a5  3747617    1.15M  14:38:20
      4             99,000:0x00005555555552a5  3747608    1.60M  14:38:19
Nanny: pid=3747611; memory used=3.75M
Total memory used: 9.70M
Snapshot creation times: mean=2ms; max=3ms; previous=2ms

Configuring snapshots

It is possible to configure the maximum number of snapshots to keep. This is a trade-off between memory and performance:

  • Reducing the number of snapshots reduces memory usage, but makes reverse-execution commands and time-travel commands slower.

  • Increasing the number of snapshots increases memory usage, but makes reverse-execution and time-travel commands faster.

Note

This is because reverse-execution and time-travel commands are implemented by jumping to the last snapshot and replaying it forward. Fewer snapshots mean that the distance between snapshots is greater and hence more replay is required to reach a given time in execution history.

By default, the maximum number of snapshots is 35. This can be configured using the udb --max-snapshots command-line option, or the UNDO_snapshots environment variable. For example:

$ udb --max-snapshots 20 --args examples/hashtable

Note

The Undo Engine makes a best effort to retain no more than the configured number of snapshots, but this is subject to its own minimum requirements.

Snapshots adaptation

Snapshots are automatically pruned as the program runs, to avoid exhausting system resources. If the available memory in the system falls below a threshold, the Undo Engine prunes snapshots until the threshold is restored (subject to the minimum requirements). The default threshold is 10%. This can be changed by setting the UNDO_memory_pressure_threshold environment variable to the threshold percentage, up to a maximum value of 75. Set this to 0 to disable this feature.

This means that the longer the program has been running, the greater the distance between snapshots. Thus the first reverse-execution command after a long run is likely to be slow. Subsequent reverse-execution commands should be faster.