Usage statistics collection¶
UDB collects some usage statistics to protect Undo from unlicensed or unlawful use of our software and services, plus some additional usage data to help Undo improve our products. This does not include any sensitive data about the programs being debugged.
By default, the additional usage data is anonymized. Sharing full, and personally-identifying, usage statistics will help us to improve our products and to improve our customer support.
For details, see Undo’s privacy policy. To change which information is shared with Undo, use the set share-usage-statistics command.
Implementation details¶
In the interest of transparency, this page explains how our usage statistics collection works and what is collected.
While running, UDB collects information about the features used, and saves them into a JSON file
in $XDG_DATA_HOME/undo/udb/telemetry/v2/
where $XDG_DATA_HOME
, if unset, defaults to
~/.local/share/
.
The filename is comprised of a random UUID and the .json
extension.
The JSON files are formatted in a human-readable way to make it easier to inspect them.
When UDB quits, the JSON file is submitted to our api.undo.io
server.
If delivery fails (for instance, because the user is not connected to the Internet) the file is
cached on disk and is submitted the next time UDB is started.
Once the usage statistics JSON file is received by the server, personally-identifiable data and
additional usage data are split into separate collections (i.e. tables).
With the default setting of share-usage-statistics, the two sets
of information cannot be linked and are totally independent thus anonymizing the additional usage
data.
With explicit user permission (that is, when the user agrees to share full usage statistics in the
prompt shown by UDB, or by using set share-usage-statistics on),
the telemetry_id
, license
and user_id
fields (described below) are used to create a
relationship between the two sets of data.
JSON format description¶
Each bullet point corresponds to a key in the JSON dictionary submitted to our API server. Indented bullet points represent nested values.
comment (string)
A string referring to this page for users who accidentally discover the usage statistics JSON files and open them.
session_id (string)
Unique random identifier (UUID) for this UDB session which is used to prevent multiple accidental submissions of the same file.
This identifier is also used as part of the file name where the JSON is saved on disk before submission.
licensing (object)
License enforcement data.
This is always included and contains personally-identifiable information which is used to protect Undo from unlicensed use of our software.
This object contains the following keys:
license (string)
The UID of the license.
Example:
"e22bbc6c7ad0e26adb07496f60de4a603e4781d44d6c3ca0d63b96ac"
username (string or null)
For licenses configured to use a keyserver, the username. Otherwise,
null
.Example:
"john_smith"
keyserver_id (string or null)
For licenses configured to use a keyserver, the identifier used for communications with the keyserver. Otherwise,
null
.udb_version (string)
The version of UDB.
This information is also included in the additional usage data section (the
to_be_anonymized
key).Example:
"7.0.0"
is_redistributable_udb (boolean)
Whether this usage statistics are generated by Redistributable UDB, a UDB variant with limited features that can be shipped together with customers’ applications.
This information is also included in the additional usage data section (the
to_be_anonymized
key).start_time (date and time as string)
The UTC wall-clock time at which the UDB session started.
Example:
"2023-08-10T16:05:35.123456"
end_time (date and time as string)
The UTC wall-clock time at which the UDB session ended.
Example:
"2023-08-10T18:12:01.654321"
license_accepted (boolean or null)
Whether the user has accepted the license, or
null
if the user could not be asked (for instance, because they are using an IDE that doesn’t support this feature).used_licensable_features (object)
Which licensable features were used by this UDB session.
This information is also included in the additional usage data section (the
to_be_anonymized
key).This object contains the following keys:
started_process (boolean)
Whether a live process was started by UDB.
attached_to_process (boolean)
Whether UDB attached to a running process.
loaded_core (boolean)
Whether UDB loaded a core file.
loaded_recording (boolean)
Whether UDB loaded a LiveRecorder recording.
saved_recording (boolean)
Whether UDB saved a process’s execution history to a LiveRecorder recording.
remote_debugging (boolean)
Whether UDB used a remote server for debugging.
used_archs (object)
The CPU architectures for debugged programs or loaded LiveRecorder recordings during this UDB sesssion.
This information is also included in the additional usage data section (the
to_be_anonymized
key).This object contains the following keys:
x64 (boolean)
Whether any of the debugged programs or loaded LiveRecorder recordings are Linux AMD/Intel x86-64 processes.
x32 (boolean)
Whether any of the debugged programs or loaded LiveRecorder recordings are Linux Intel i386 processes.
Note that this is unrelated to the rarely used x32 ABI which the Undo Engine doesn’t support.
arm64 (boolean)
Whether any of the debugged programs or loaded LiveRecorder recordings are Linux ARMv8 AArch64 processes.
tool (string)
The tool used for this session.
This is
"udb_plain"
for direct uses of UDB but, for instance, it’s"postfailurelogging"
if UDB is used via thepostfailurelog
tool.This is similar to the
ui
field part of theto_be_anonymized
object but it’s more coarse as it’s only used to verify that users use tools in accordance with the terms of their licenses.
to_be_anonymized (object)
Additional usage data.
By default, share-usage-statistics is set to
anonymized
, which means this part of the usage statistics is anonymized once the server receives it.If the share-usage-statistics setting is set to
licensing-only
, additional usage data is not collected and this field is set tonull
.This object contains the following keys:
telemetry_id (string)
A randomly-generated UUID that is preserved across runs of UDB on the same machine. This is used to identify the same user across UDB sessions without revealing who the user is or their personal data.
Example:
"bc8bbbcf-2e97-4de9-9f83-28e556433735"
udb_version (string)
The version of UDB.
This information is also included in the mandatory licensing section (the
licensing
key).Example:
"7.0.0"
is_redistributable_udb (boolean)
Whether this session was generated by Redistributable UDB, a UDB variant with limited features that can be shipped together with customers’ applications.
This information is also included in the mandatory licensing section (the
licensing
key).deferred_recording (boolean)
Whether UDB was started in deferred-recording mode.
ui (string)
The interface through which the user interacted with UDB.
For instance, if UDB is used on the terminal with no additional UI, then this is set to
"udb_console"
. If UDB is used via Visual Studio Code, then this is set to"vscode"
.tui_used (boolean)
Whether TUI (GDB’s Text User Interface) was used.
used_licensable_features (object)
Which licensable features were used by this UDB session.
This information is also included in the mandatory licensing section (the
licensing
key).started_process (boolean)
Whether a live process was started by UDB.
attached_to_process (boolean)
Whether UDB attached to a running process.
loaded_core (boolean)
Whether UDB loaded a core file.
loaded_recording (boolean)
Whether UDB loaded a LiveRecorder recording.
saved_recording (boolean)
Whether UDB saved a process’s execution history to a LiveRecorder recording.
remote_debugging (boolean)
Whether UDB used a remote server for debugging.
used_archs (object)
The CPU architectures for debugged programs or loaded LiveRecorder recordings during this UDB sesssion.
This information is also included in the mandatory licensing section (the
licensing
key).x64 (boolean)
Whether any of the debugged programs or loaded LiveRecorder recordings are Linux AMD/Intel x86-64 processes.
x32 (boolean)
Whether any of the debugged programs or loaded LiveRecorder recordings are Linux Intel i386 processes.
Note that this is unrelated to the rarely used x32 ABI which the Undo Engine doesn’t support.
arm64 (boolean)
Whether any of the debugged programs or loaded LiveRecorder recordings are Linux ARMv8 AArch64 processes.
commands (list of objects)
A list of commands which were executed.
Each object in the list contains the following keys:
name (string)
The command name, without any argument.
In case of aliases or abbreviations, the full original command name is used.
Example:
"reverse-next"
result (string)
Whether the command succeeded, failed or was interrupted by the user.
"success"
: The command terminated successfully."error"
: The command terminated with an error. For instance, the user tried to use the step command while the debugged program was not running."interrupted"
: the command was interrupted by the user with ctrl-C.
duration (elapsed seconds as floating point number)
How long the command took to complete, in seconds.
bbcount_delta (integer or null)
The difference, in BBs, between the time before and after the execution of this command.
A positive number denotes a movement forward in time, a negative number a movement backward in time, and
0
means that the time was not changed. This value isnull
if there’s no execution history before or after the command execution (for instance, the debugged program was not running).time_since_previous_command (elapsed seconds as floating point number)
The inactivity time before this command.
That is, the time between the invocation of this command and the end of the previous command’s execution. Or, if this is the first executed command after UDB started, the time since startup.
Note that some commands are not tracked in usage data due to technical reasons or because they are not useful commands to track. These untracked commands are ignored by this measurement.
time_inactive_before_quit (elapsed seconds as floating point number or null)
The inactivity time between the last command and UDB quitting.
Can be
null
if UDB doesn’t quit properly (e.g. in case of crash).loaded_recordings (list of objects)
List where each item contains information about a LiveRecorder recording that was loaded by UDB.
Each object in the list contains the following keys:
wallclock_start (date and time as string)
The UTC wall-clock time at the start of recorded history.
Example:
"2023-08-01T15:05:35.123456"
wallclock_end (date and time as string)
The UTC wall-clock time at the end of recorded history.
Example:
"2023-08-01T15:35:01.654321"
recording_undo_version (string)
The version of the Undo Engine used to produce this recording.
recording_size_bytes (integer)
The size of the recording, in bytes.
recording_application_uid (string or null)
The UID of the license used to create the recording.
uuids (dictionary mapping string to string)
Random identifiers for the recording.
Currently, the following identifiers are stored:
save
: A random identifier which changes every time a LiveRecorder recording of a process is saved. This identifies a single recording. Can benull
for old recordings.run
: A random identifier which persists even if multiple LiveRecorder recordings are saved. Multiple recordings with differentsave
identifiers can share a singlerun
identifier. Can benull
for old recordings.shmem_log
: A random identifier for the log of shared memory accesses if Multi-Process Correlation for Shared Memory is enabled.null
otherwise.
Example:
{ "save": "22859b9a-526e-46ea-a1e4-c8f6eed8c41a", "run": "179ccc3a-6014-40e1-8171-33f6d4d97ee9", "shmem_log": null }
load_metrics (dictionary mapping string to object)
A mapping from the name of a metric related to the loading of this recording, to a description of that metric.
Each value in the dictionary is an object containing the following keys:
size (integer)
A measure of the size of the data processed, typically a count of bytes.
duration (elapsed seconds as floating point number)
How long it took to complete.
distro (dictionary mapping string to anything)
Information about the GNU/Linux distribution used to run UDB.
This information is determined using the Python distro package. In particular, this dictionary is the value returned by the distro.info(best=True) function.
Example:
{ "id": "ubuntu", "like": "debian", "codename": "jammy", "version": "22.04.3", "version_parts": { "major": "22", "minor": "04", "build_number": "3" } }
crash_logs (dictionary mapping string to string)
Anonymized crash logs (if any component crashed or failed due to an assertion error).
Keys are the name of the log file (containing the name of the crashed component and the process ID) and values are the crash logs’ text.
The crash logs contain the Undo Engine backtrace and, in case of assertion failure, the format string for the assertion message (but not its values). This means that crash logs never contain information about the user or the debugged program.
Example (simplified and formatted in Python-like syntax for readability):
{ "undo_crash_log_123_udbserver.log": """ ************************************************************** Fatal error: invalid foo %d in "%s" Location: src/apps/udbserver/server.cpp:251:ensure_session [123:123] ************************************************************** frame 0: frame=0x7fff7477e500 pc=0x44e054 debug_backtrace+0xb4 [debug_libunwind.c:42:debug_backtrace] frame 1: frame=0x7fff7477ee10 pc=0x44df8c debug_dump_telemetry_crash_log+0x18c [debug_crash_log.c:75 (discriminator 1):debug_dump_telemetry_crash_log] frame 2: frame=0x7fff7477ee80 pc=0x461d31 s_handle_error+0xf1 [error.c:197:s_handle_error] frame 3: frame=0x7fff7477eef0 pc=0x4620dd error_handle_vpanic+0x1d [error.c:239:error_handle_vpanic] frame 4: frame=0x7fff7477ef10 pc=0x4621e8 error_handle_panic+0x78 [error.c:260:error_handle_panic] frame 5: frame=0x7fff7477eff0 pc=0x4111c0 _ZN6Server14ensure_sessionEP8Debuggee.part.45+0x3f0 [server.cpp:251:_ZN6Server14ensure_sessionEP8Debuggee] frame 6: frame=0x7fff7477f020 pc=0x40b6b0 _ZL5main2R12ServerConfigiPPcPb.constprop.112+0x410 [main.cpp:1094:_ZL5main2R12ServerConfigiPPcPb.constprop.112] frame 7: frame=0x7fff7477f210 pc=0x4069f2 main+0x192 [main.cpp:1267:main] """, }
extra (dictionary mapping string to anything)
Field for arbitrary data that scripts not officially part of UDB can use to collect usage statistics.
This is useful, for instance, for our addons or for scripts implementing experimental features which are not yet part of UDB.
Example:
{ "feature_foo": { "did_bar": true, "did_baz": false } }
update_sharing (object or null)
Set only when the user changes the usage statistics sharing setting so that the server can update its data. Otherwise,
null
.For example, if the user decides to change the share-usage-statistics setting from its default of
anonymized
toon
, then this object will be non-null
. Once UDB receives this JSON object, it creates a relationship between the value of thetelemetry_id
field (from the additional usage data), and the values of thelicense
andusername
fields (from the licensing data). This means that it’s now possible to correlate usage statistics from the separate licensing and additional usage data.If the user then decides to revoke consent via set share-usage-statistics anonymized, the fields of the
update_sharing
object are set accordingly and, once the server receives this JSON object, the link betweentelemetry_id
andlicense
/username
is broken making the two sets of data independent again.If not
null
, this object contains the following keys:value (string)
The sharing setting chosen by the user.
See the set share-usage-statistics command for a list of possible values.
Example:
"anonymized"
time (date and time as string)
The time when this setting was changed. This is included to avoid race conditions if the user changes the usage statistics sharing setting in separate UDB processes running at the same time.
Example:
"2023-08-10T16:12:00.112233"