Log Jump

The Log Jump feature bridges the gap between log files and Undo recordings by matching a log entry with a time in the execution history of a program. This matching can be done in two different ways:

For example, suppose that a program was recorded by live-record and its log contains a failure:

[14:38:45.111058] pass x=5.6919e-07 sinf=0.000001 sine=0.000001
[14:38:45.114785] pass x=2.0994 sinf=0.863510 sine=0.863510
[14:38:45.114849] pass x=0.456745 sinf=0.441029 sine=0.441029
[14:38:45.114874] pass x=0.767667 sinf=0.694458 sine=0.694458
[14:38:45.114877] pass x=1.60536e-06 sinf=0.000002 sine=0.000002
[14:38:45.114881] pass x=4.20805e-08 sinf=0.000000 sine=0.000000
[14:38:45.114887] pass x=3.8617e-10 sinf=0.000000 sine=0.000000
[14:38:45.114978] fail x=2.08741e+09 sinf=-0.247028 sine=-0.247028
[14:38:45.115192] pass x=834758 sinf=-0.898864 sine=-0.898864
[14:38:45.115244] pass x=422311 sinf=-0.865702 sine=-0.865702

Using wall-clock times

The ugo wallclock command in UDB jumps to the time in execution history corresponding to a wall-clock time.

The info wallclock command displays the wall-clock time corresponding to the current time in execution history.

The info wallclock-extent command displays the range of wall-clock times covered by execution history.

ugo wallclock wall-clock-time

Jump to a time in execution history near wall-clock-time.

This command is affected by the Log Jump limitations.

To investigate the failure, load the recording (see Loading and saving recordings) and jump to the wall-clock time for the log entry:

$ udb
not running> uload recording.undo
0x00005630f32b8130 in _start ()

The debugged program is at the beginning of recorded history. Start debugging
from here or, to proceed towards the end, use:
    continue - to replay from the beginning
    ugo end  - to jump straight to the end of history
start 1> ugo wallclock 14:38:45.114978
Going to time 5,658.
0x00007f5b27b657d0 in gettimeofday ()
75% 5,658> backtrace
#0  0x00007f5b27b657d0 in gettimeofday ()
#1  0x00005630f32b8286 in log_message (format=0x5630f32b9075 "%s x=%g sinf=%f sine=%f")
    at sine.c:21
#2  0x00005630f32b8694 in check (x=2.08740941e+09) at sine.c:80
#3  0x00005630f32b884e in main (argc=3, argv=0x7fffd873f258) at sine.c:112

The ugo wallclock command jumps to the time in execution history where the gettimeofday() function was called. The backtrace shows that this time is, as expected, inside the call to the logging function: to go back to the code which called log_message(), use the reverse-finish command:

75% 5,658> reverse-finish
0x00005630f32b8281 in log_message (format=0x5630f32b9075 "%s x=%g sinf=%f sine=%f") at sine.c:21
21          ssize_t e = gettimeofday(&tv, NULL);
75% 5,656> reverse-finish
0x00005630f32b868f in check (x=2.08740941e+09) at sine.c:80
80          log_message("%s x=%g sinf=%f sine=%f", s1 == s2 ? "pass" : "fail", x, s1, s2);
75% 5,655> print x
$1 = 2.08740941e+09

The ugo wallclock command supports many commonly used date and time formats with and without timezones, including the standard ISO 8601 or RFC 3339 format (2021-03-14T21:08:58Z).

If the timezone is not specified, the default timezone is used. See the set timezone command.

When wall-clock-time is ambiguous, UDB attempts to find a matching wall-clock time in recorded history. You’ll see that in the examples above, the log entries only contain the time of day, leaving the date unspecified. But because the extent of recorded history was only a few milliseconds, UDB was able to deduce the date and so go to the correct time in recorded history.

When multiple interpretations of wall-clock-time can be found in recorded history, UDB goes to the one that’s closest to the current interpolated wall-clock time, as returned by the info wallclock [bbcount*|*bookmark] command.

info wallclock [bbcount*|*bookmark]

The interpolated wall-clock time corresponding to a time in execution history. If no argument is given, the current bbcount is used.

For example:

75% 5,655> info wallclock
2024-11-13T14:38:45.114977Z +/- 90 microseconds
75% 5,655> ugo wallclock 14:38:45.114978
Going to time 5,658.
0x00007f5b27b657d0 in gettimeofday ()
75% 5,658> info wallclock
2024-11-13T14:38:45.114978Z +/- 0 microseconds

This command is affected by the Log Jump limitations, which are expressed in the form of a confidence interval after the wall-clock time. In the above example, the program is at a call to gettimeofday() and so the wall-clock time is known exactly and the confidence interval is +/- 0 microseconds.

But after moving to a nearby point in execution history, for example by using the finish command, the wall-clock time is only known approximately, and so is reported with a wider confidence interval:

75% 5,658> finish
Run till exit from #0  0x00007f5b27b657d0 in gettimeofday ()
0x00005630f32b8286 in log_message (format=0x5630f32b9075 "%s x=%g sinf=%f sine=%f") at sine.c:21
21          ssize_t e = gettimeofday(&tv, NULL);
75% 5,660> info wallclock
2024-11-13T14:38:45.114979Z +/- 213 microseconds

The interpolated time is printed in the default timezone. See the set timezone command.

info wallclock-extent

The wall-clock times at the start and end of execution history.

The times are printed in the default timezone. See the set timezone command. For example:

75% 5,660> info wallclock-extent
Start time: 2024-11-13T14:38:45.107434Z
End time:   2024-11-13T14:38:45.132060Z

Timezones

Log Jump understands wall-clock times with explicit timezones, for example, a wall-clock time ending with +01:00 refers to Central European Time (one hour ahead of UTC), while one ending with -06:00 refers to Central Standard Time (six hours behind UTC).

If no timezone information is specified, the time is interpreted using the default timezone, which is UTC (Coordinated Universal Time) by default.

You can change the default timezone using the set timezone command.

set timezone timezone

Set the default timezone for wall-clock times.

The default timezone is used to interpret the argument to the ugo wallclock command if it lacks a timezone, and to format the output of the info wallclock and info wallclock-extent commands.

The default timezone is UTC by default. If your logs contain wall-clock times in local time but without an explicit timezone, use this command to set an more convenient default timezone.

Specify timezone using the format “Area/Location” employed by the IANA timezone database. See Wikipedia’s List of tz database time zones for a full list of timezone names.

For example:

75% 5,660> info wallclock
2024-11-13T14:38:45.114979Z +/- 213 microseconds
75% 5,660> set timezone Asia/Tokyo
Default timezone set to Asia/Tokyo.
75% 5,660> info wallclock
2024-11-13T23:38:45.114979+09:00 +/- 213 microseconds

After setting the default timezone, all times without an explicit timezone are interpreted as if belonging to the default timezone. For example:

75% 5,660> ugo wallclock 2024-11-13T23:38:45.111058
Going to time 648.
0x00007f5b27b657d0 in gettimeofday ()
8% 648> info wallclock
2024-11-13T23:38:45.111058+09:00 +/- 0 microseconds

The default timezone can be displayed with the show timezone command, and the default timezone can be reset to UTC with the unset timezone command.

show timezone

The default timezone for wall-clock times.

For example:

8% 648> show timezone
Default wall-clock timezone is Asia/Tokyo.

unset timezone

Reset the default timezone for wall-clock times to UTC.

Log Jump limitations

  • Log Jump correlates wall-clock time to execution history based on the presence of calls to gettimeofday() or clock_gettime() made by the program being recorded. If the program makes no calls to these functions, the Log Jump feature is not available.

    Note

    clock_gettime() is supported only when used with clk_id set to CLOCK_REALTIME. This is because a clk_id of CLOCK_REALTIME_COARSE generates low-precision wall-clock times that cannot be combined with high-precision wall-clock times, and other clk_id values such as CLOCK_MONOTONIC produce times which are not related to the wall-clock time.

    This means that an exact match between wall-clock time and execution history can be obtained only at the exact point when one of the supported functions is called. For all other points, the wall-clock time is estimated by linearly interpolating the results from nearby calls to the supported functions. This means that if calls to the supported functions are few and far between, the confidence interval can be large.

    Warning

    The interpolation algorithm assumes monotonicity of wall-clock times, meaning that if the system time is adjusted during the execution of the program, this can produce unreliable results.

Using UDB time notation

Log Jump works with any recorded program which uses the supported wall-clock time functions. However, if it is possible to modify the program, a more powerful option is to include times in execution history in UDB’s time notation in log entries.

Adding times in execution history to log entries

This functionality is provided by the undoex_get_bbcount() function declared in undoex/undoex-engine-status.h in the libundoex library. The time in execution history is returned as an unsigned 64-bit integer representing the bbcount.

undoex_get_bbcount() can be used as follows:

#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>

#include "undoex-engine-status.h"

static void
print_current_bbcount(void)
{
    uint64_t bbcount = undoex_get_bbcount();
    if (bbcount > 0)
    {
        printf("The current time in execution history is bbcount: %" PRIu64 "\n", bbcount);
    }
    else if (errno == ENOTSUP)
    {
        printf("Not currently recording\n");
    }
    else
    {
        perror("Couldn't access the time in execution history");
    }
}

int
main(void)
{
    print_current_bbcount();
    return EXIT_SUCCESS;
}

The functions in the libundoex library work when the program is recorded under UDB and when recorded through the LiveRecorder API. If called when the program is not being recorded, undoex_get_bbcount() returns 0 (which is not a valid bbcount) and sets errno to ENOTSUP.

You can find the libundoex library and header files in the undoex subdirectory of the UDB release. The library is available as both a statically linked library (libundoex_pic_<arch>.a) and a dynamically linked library (libundoex_<arch>.so). There are no other dependencies, so the compilation command will be similar to this:

$ gcc -ggdb -Iundoex undoex/examples/simple_ex.c undoex/libundoex_pic_x64.a -o simple_ex

Using the bbcount

The bbcount returned by undoex_get_bbcount() can be used in UDB, after loading the corresponding Undo recording file, to jump to the bbcount in execution history when the function was called.

When debugging the code above, the output looks like this:

$ udb ./simple_ex
Reading symbols from ./simple_ex...
not running> run
Starting program: simple_ex
The current time in execution history is bbcount: 14261

Program received signal SIGSTOP, Stopped (signal).

The program has exited, but is still being debugged.
You can use UDB reverse commands to go backwards; see "help udb" for details.

__GI__exit (status=status@entry=0) at ../sysdeps/unix/sysv/linux/_exit.c:30
30      ../sysdeps/unix/sysv/linux/_exit.c: No such file or directory.

Now you can use the ugo time command to jump to the reported bbcount. UDB automatically reverses execution from undoex_get_bbcount() to the caller, so that it ends up in your code and not inside the undoex_get_bbcount() implementation. For example:

end 14,864> ugo time 14261

Moving out from libundoex into user code.
You can disable this behavior by using the "-exact" option.

print_current_bbcount () at undoex/examples/simple_ex.c:11
11          uint64_t bbcount = undoex_get_bbcount();
95% 14,198> backtrace
#0  print_current_bbcount () at undoex/examples/simple_ex.c:11
#1  0x0000555555566786 in main () at undoex/examples/simple_ex.c:29