Log Jump

The Log Jump feature bridges the gap between log files and LiveRecorder recordings by allowing users to match a log entry with a time in the execution history of a program. This matching can be done in two different ways:

Using wall-clock times

The ugo wallclock command in UDB jumps to the time in execution history corresponding to a wall-clock time.

The uinfo wallclock command displays the wall-clock time corresponding to the current time in execution history.

The uinfo wallclock-extent command displays the range of wall-clock times covered by execution history.

ugo wallclock wall-clock-time

Jump to a time in execution history near wall-clock-time.

For example, suppose that an application’s log contains a failure:

[2021-04-15 09:22:17.793190] pass x=5.6919e-07 sinf=0.000001 sine=0.000001
[2021-04-15 09:22:17.797771] pass x=2.0994 sinf=0.863510 sine=0.863510
[2021-04-15 09:22:17.797830] pass x=0.456745 sinf=0.441029 sine=0.441029
[2021-04-15 09:22:17.797864] pass x=0.767667 sinf=0.694458 sine=0.694458
[2021-04-15 09:22:17.797870] pass x=1.60536e-06 sinf=0.000002 sine=0.000002
[2021-04-15 09:22:17.797877] pass x=4.20805e-08 sinf=0.000000 sine=0.000000
[2021-04-15 09:22:17.797882] pass x=3.8617e-10 sinf=0.000000 sine=0.000000
[2021-04-15 09:22:17.798001] fail x=2.08741e+09 sinf=-0.247028 sine=-0.247028
[2021-04-15 09:22:17.798264] pass x=834758 sinf=-0.898864 sine=-0.898864
[2021-04-15 09:22:17.798329] pass x=422311 sinf=-0.865702 sine=-0.865702

To investigate the failure, load the recording and jump to the wall-clock time in the log entry:

$ udb -q
not running> uload recording.undo

The debugged program is at the beginning of recorded history. Start debugging
from here or, to proceed towards the end, use:
    continue - to replay from the beginning
    ugo end  - to jump straight to the end of history

No more reverse-execution history.
0x00005601e0f60130 in _start ()
start 1> ugo wallclock 2021-04-15 09:22:17.798001
Going to time 11,209.
0x00007ffcb2f26850 in gettimeofday ()
84% 11,209> backtrace
#0  0x00007ffcb2f26850 in gettimeofday ()
#1  0x00005601e0f60282 in log_message (format=0x5601e0f61085 "%s x=%g sinf=%f sine=%f") at sine.c:21
#2  0x00005601e0f6063a in check (x=2.08740941e+09) at sine.c:80
#3  0x00005601e0f607c5 in main (argc=3, argv=0x7ffcb2f00938) at sine.c:112

The ugo wallclock command jumps to the time in execution history where the gettimeofday() function was called. The backtrace shows that this time is, as expected, inside the call to the logging function: to go back to the code which called log_message(), use the reverse-finish command:

84% 11,209> reverse-finish
21          ssize_t e = gettimeofday(&tv, NULL);
84% 11,207> reverse-finish
80          log_message("%s x=%g sinf=%f sine=%f", s1 == s2 ? "pass" : "fail", x, s1, s2);
84% 11,206> print x
$1 = 2.08740941e+09

The ugo wallclock command supports many commonly used date and time formats with and without timezones, including the standard ISO 8601 or RFC 3339 format (2021-03-14T21:08:58Z). The handling of ambiguous formats is undefined.

Note

The time is interpolated based on the presence of calls to gettimeofday() or clock_gettime() (with clock ID set to CLOCK_REALTIME) made by the program being recorded. The accuracy of this interpolatation depends on the proximity of such calls to the specified wall-clock time. This means that going to a time displayed in a log entry will be exact, but going to an arbitrary wall-clock time will be approximate.

Warning

The interpolation algorithm assumes monotonicity of wall-clock times, meaning that if the system time is adjusted during the execution of the program, this can produce unreliable results.

uinfo wallclock

The interpolated wall-clock time for the current time in execution history.

For example:

84% 11,206> uinfo wallclock
2021-04-15T09:22:17.798000Z +/- 119 microseconds
84% 11,206> ugo wallclock 2021-04-15 09:22:17.798001
Going to time 11,209.
0x00007ffcb2f26850 in gettimeofday ()
84% 11,209> uinfo wallclock
2021-04-15T09:22:17.798001Z +/- 0 microseconds

Note

The wall-clock time reported by this command is an interpolated result using calls to gettimeofday() or clock_gettime() (with clock ID set to CLOCK_REALTIME) made by the program being recorded. The accuracy of the interpolated wall-clock time depends on the proximity of such calls to the current execution point.

Warning

The interpolation algorithm assumes monotonicity of wall-clock times, meaning that if the system time is adjusted during the execution of the program, this can produce unreliable results.

uinfo wallclock-extent

The wall-clock times at the start and end of execution history.

The times are printed in the default timezone. See the uset timezone command. For example:

84% 11,209> uinfo wallclock-extent
Start time: 2021-04-15T09:22:17Z
End time:   2021-04-15T09:22:17Z

Availability of wall-clock times

UDB’s representation of time in execution history is related to the flow of the program, not the actual wall-clock time. To provide a match between the two, UDB inspects the LiveRecorder recording and finds calls to the gettimeofday() function or the clock_gettime() function (with clock ID, the first argument, set to CLOCK_REALTIME).

This means that an exact match between wall-clock time and a time in execution history can be obtained only at the exact point when one of the supported wall-clock time functions is called. For all other times, nearby calls to supported functions are linearly interpolated and the confidence interval is shown (+/- N microseconds).

For instance, in the following example, a wall-clock time corresponding to a gettimeofday() function call is shown with +/- 0 microseconds as the confidence interval. Moving to some other code, for instance by setting a breakpoint and continuing, will show an approximate wall-clock time with confidence interval:

84% 11,209> ugo wallclock 2021-04-15 09:22:17.798001
Going to time 11,209.
0x00007ffcb2f26850 in gettimeofday ()
84% 11,209> uinfo wallclock
2021-04-15T09:22:17.798001Z +/- 0 microseconds
84% 11,209> finish
Run till exit from #0  0x00007ffcb2f26850 in gettimeofday ()
log_message (format=0x5601e0f61085 "%s x=%g sinf=%f sine=%f") at sine.c:21
21          ssize_t e = gettimeofday(&tv, NULL);
84% 11,211> uinfo wallclock
2021-04-15T09:22:17.798002Z +/- 263 microseconds

Timezones

The ugo wallclock command understands timezones passed as part of the wall-clock time, for instance a time string ending with +01:00 will be interpreted as referring to Central European Time (one hour ahead of UTC), while one ending with -06:00 will be interpreted as referring to Central Standard Time (six hours behind UTC).

If no timezone information is specified, the time is interpreted using the default timezone, which is UTC (Coordinated Universal Time) by defalut. You can change the default timezone using the uset timezone command.

uset timezone timezone

Set the default timezone for wall-clock times.

The default timezone is used to interpret the argument to the ugo wallclock command if it lacks a timezone, and to format the output of the uinfo wallclock and uinfo wallclock-extent commands.

The default timezone is UTC by default. If your logs contain timestamps in local time but without an explicit timezone, use this command to set an more convenient default timezone.

Specify timezone using the format “Area/Location” employed by the IANA timezone database. See Wikipedia’s List of tz database time zones for a full list of timezone names.

For example:

84% 11,211> uinfo wallclock
2021-04-15T09:22:17.798002Z +/- 263 microseconds
84% 11,211> uset timezone Asia/Tokyo
Default timezone set to Asia/Tokyo.
84% 11,211> uinfo wallclock
2021-04-15T18:22:17.798002+09:00 +/- 263 microseconds

After setting the default timezone, all times without an explicit timezone are interpreted as if belonging to the default timezone. For example:

84% 11,211> ugo wallclock 2021-04-15T18:22:17.793190
Going to time 1,400.
0x00007ffcb2f26850 in gettimeofday ()
10% 1,400> uinfo wallclock
2021-04-15T18:22:17.793190+09:00 +/- 0 microseconds

The default timezone can be displayed with the ushow timezone command, and the default timezone can be reset to UTC with the uclear timezone command.

ushow timezone

The default timezone for wall-clock times.

For example:

10% 1,400> ushow timezone
Default wall-clock timezone is Asia/Tokyo.

uclear timezone

Reset the default timezone for wall-clock times to UTC.

Limitations

  • Log Jump using wall-clock times depends on the presence of calls to the supported wall-clock time functions. If the program makes no calls to these functions, the feature is not available.
  • If calls to the support time functions are few and far between, the confidence interval can be large.
  • In long-running programs, finding the wall-clock time in recordings can be slow.
  • clock_gettime() is supported only when used with CLOCK_REALTIME as clock ID. This is because CLOCK_REALTIME_COARSE generates low-precision wall-clock times that cannot be combined with high-precision wall-clock times, and other clock types such as CLOCK_MONOTONIC produce times which are not related to the wall-clock time.

Using UDB time notation

The ugo wallclock and uinfo wallclock commands work with any recorded program which uses the supported wall-clock time functions. However, if it is possible to modify the program, a more powerful option is to include timestamps in UDB’s time notation in log entries.

Adding timestamps in UDB’s time notation to log entries

This functionality is provided by the undoex_get_bbcount() function declared in undoex/undoex-engine-status.h in the libundoex library. The time in execution history is returned as an unsigned 64-bit integer representing the bbcount.

undoex_get_bbcount() can be used as follows:

#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>

#include "undoex-engine-status.h"

static void
print_current_bbcount(void)
{
    uint64_t bbcount = undoex_get_bbcount();
    if (bbcount > 0)
    {
        printf("The current time in execution history is: %" PRIu64 "\n", bbcount);
    }
    else if (errno == ENOTSUP)
    {
        printf("Not currently recording\n");
    }
    else
    {
        perror("Couldn't access the time in execution history");
    }
}

int
main(void)
{
    print_current_bbcount();
    return EXIT_SUCCESS;
}

The functions in the libundoex library work when the program is recorded under UDB and when recorded through the LiveRecorder API. If called when the program is not being recorded, undoex_get_bbcount() returns 0 (which is not a valid bbcount) and sets errno to ENOTSUP.

You can find the libundoex library and header files in the undoex subdirectory of the UDB release. The library is available as both a statically linked library (libundoex_pic_<arch>.a) and a dynamically linked library (libundoex_<arch>.so). There are no other dependencies, so the compilation command will be similar to this:

$ gcc -ggdb -Iundoex undoex/examples/simple_ex.c undoex/libundoex_pic_x64.a -o simple_ex

Using the bbcount

The bbcount returned by undoex_get_bbcount() can be used in UDB, after loading the corresponding LiveRecorder recording file, to jump to the bbcount in execution history when the function was called. For example, when debugging the code above, the output looks like this:

$ udb -q ./simple_ex
Reading symbols from ./simple_ex...
not running> run
Starting program: simple_ex
The current time in execution history is: 9219

Program received signal SIGSTOP, Stopped (signal).

The program has exited, but is still being debugged.
You can use UDB reverse commands to go backwards; see "help udb" for details.

0x00007ffff7ac6bd6 in __GI__exit (status=status@entry=0) at ../sysdeps/unix/sysv/linux/_exit.c:31
31      ../sysdeps/unix/sysv/linux/_exit.c: No such file or directory.

Now you can use the ugo time command to jump to the reported bbcount. UDB automatically reverses execution from undoex_get_bbcount() to the caller, so that it ends up in your code and not inside the undoex_get_bbcount() implementation. For example:

end 13,671> ugo time 9219
78      ../sysdeps/unix/syscall-template.S: No such file or directory.

Moving out from libundoex into user code.
You can disable this behaviour by using the '--exact' option.

11          uint64_t bbcount = undoex_get_bbcount();
67% 9,208> backtrace
#0  print_current_bbcount () at undoex/examples/simple_ex.c:11
#1  0x0000555555555c0e in main () at undoex/examples/simple_ex.c:29