Integrating LiveRecorder into your application and workflow¶
LiveRecorder enables you to capture complete execution histories of your applications, creating recordings that can be replayed later for debugging. This is particularly valuable for:
reproducing intermittent bugs that are difficult to debug in real-time;
post-mortem debugging of crashes or unexpected behavior in production environments;
accelerating development cycles by providing detailed execution traces for complex issues.
This document explains how to integrate LiveRecorder into your C/C++ application and use it for:
recording applications in internal testing as part of a LiveRecorder for Development workflow;
recording applications in the field as part of a LiveRecorder for Production workflow.
Application integration¶
Whether you intend to use LiveRecorder in development or production workflows, the first step is to integrate it into your application.
You can choose to integrate either:
The LiveRecorder command-line tool (live-record).
The LiveRecorder API.
The command-line tool is easier to use and does not require modifications to source code. The API provides finer-grained control, allowing you to start and stop recording at specified points, and enables more advanced use cases.
For LiveRecorder for Development (LRD) workflows, either option can work well. For LiveRecorder for Production (LRP) workflows, we recommend linking against the API.
If you choose to integrate using the LiveRecorder API, you should also add a mechanism for enabling and disabling LiveRecorder in your application. This could involve:
an environment variable such as
YOURAPP_RECORDING_ENABLED
;a command-line option such as
--recording-enabled
;a console command such as
enable-recording
;a graphical user interface element such as a drop-down menu option or a button.
Outcome: Recordings of your application are generated by LiveRecorder and you can step through them using UDB to understand what your software really did.
LiveRecorder for Development workflow integration¶
Test harness integration¶
Now that you can turn LiveRecorder on and off, you need to decide when to do so. The best policy will depend on what you’re aiming to achieve.
For example, you could be aiming to:
Fix flaky tests. You’ve got a number of tests which, despite your best efforts, fail intermittently and you’re struggling to reproduce and fix them.
Accelerate CI. When tests fail in your Continuous Integration system, you want as many of them as possible to have recordings associated with them so they’re as easy as possible to debug.
You can use LiveRecorder to capture failures, but you still need to actually capture a failure as it is happening. LiveRecorder makes this no more or less likely: some failures will occur more often when being recorded, others less often, and others will fail just as often. (But see our Thread Fuzzing feature which can be used to increase the reproduction rate.)
So, you need to decide:
whether recording is enabled in all tests, suites, configurations or platforms, or only a subset of those;
whether recording must be enabled the first time a test fails or whether it can be rerun to provoke a failure;
how many times to rerun a failing test in an attempt to generate a recording.
Based on these decisions, you may end up with a combination of the following integration scenarios:
Per-machine recording. If you run your tests on a variety of machine configurations, consider enabling LiveRecorder on some of them.
Per-test recording. If you have a small number of tests that fail intermittently and you want to capture recordings of them, consider applying a tag or marker to those tests and configuring your test system to enable LiveRecorder in tests with that tag.
Recorded rerun-on-failure. After a test failure, the test is rerun with LiveRecorder enabled. A single rerun will be sufficient for generating recordings of failures that reproduce fairly easily and minimizes the overall performance impact of LiveRecorder, while rerunning until a failure is seen is more likely to generate recordings. This approach requires modifications to your test harness and some work is required to ensure that test runs with a large number of failing tests do not generate an excessive number of recordings.
Outcome: Your test system generates recordings of interesting application failures that occurred during testing.
Internal roll-out¶
Now you need to make sure that your test system changes are safe to roll out and tell your users that the recordings exist.
You should consider storing recordings as artifacts in your CI system, or uploading them to a service such as JFrog Artifactory or Sonatype Nexus Repository. You could include information which includes links to recordings, for example:
in HTML pages or emails describing your test results;
in automatically-generated entries for test failures in your bug tracker.
Outcome: Your developers are presented with recordings of application failures that occurred during testing, helping them to diagnose and fix them.
LiveRecorder for Production workflow integration¶
Rolling out a LiveRecorder integration for production use has a number of additional considerations as described below.
Training your staff¶
Teams must be made aware of your integrations and must be trained to:
create recordings using your application’s integrations;
replay and analyse recordings to root cause issues.
The training should be adapted to the technical expertise of your teams.
an engineering team will need to know both how to create recordings and how to replay them;
a support team will need to know how to create recordings so that it can walk your customer through the process of creating a recording and then hand the recording over to your engineering team;
depending on the technical expertise of your support engineers, they may also benefit from knowing how to replay and analyse recordings.
Recording sensitive data¶
Undo recordings contain all the data required to recreate the whole memory state of your application, and may therefore contain sensitive customer data. It is important to be transparent about what the recording contains, and what it means to share the recording. It would be a good idea to provide your customer with a document outlining the contents of a recording file. The most important components are:
the application’s executable and library files, including debug information;
the process’s whole memory state and CPU register state;
any data received from system calls, for example, file data or network data;
any data exchanged with other processes via shared memory;
information about any signals received by the process.
Customers who are uncomfortable with sending back recording files may be satisfied with one of these alternative workflows:
Using an air-gapped chamber to store and replay the recording files. Access to the chamber can be limited according to your customer’s requirements.
Using Post Failure Logging to extract information from the recording file without the recording file ever leaving your customer’s premises. Post Failure Logging involves running a probe script developed by your engineers and generating a human readable log file that your customer can vet before sending back.
Troubleshooting LiveRecorder integrations¶
See Troubleshooting.