Microservices

Architecting an application as a set of cooperating microservices offers many advantages:

  • Each service can be developed, tested, deployed and, if necessary, rolled back independently.

  • Services communicate via documented APIs, and implementation complexities can be isolated and hidden behind the API.

However a microservice-based architecture presents particular challenges when attempting to diagnose a defect; the defect will manifest in the output from one service, but one or more other services may be implicated in the root cause. It can be difficult to follow the train of calls between the various services in order to track down that root cause.

LiveRecorder allows you to go back to any point in your application’s execution history, not only stepping backwards and forwards by source lines or function call within a single service, but also stepping backwards and forwards between the component services at the point that one service calls another.

Recording and replaying a microservice-based application

Recording

Record the microservices of interest in your application as described in Recording an application. You should reproduce the defect that you’re attempting to diagnose while all the services of interest are being simultaneously recorded - if you record the services at separate times you won’t be able to step between them when replaying.

You may also wish to set up endpoints in your services to allow you to start and stop recording at will - refer to the API documentation and, for an example of using the API in a Jakarta EE based microservice, to the Hands-on Undo GitHub project.

Note

It’s not necessary to record all the microservices that compose your application, but if you don’t record a service you won’t be able to step into it when you come to replay and debug your application.

Replaying

Choose a replay environment and set up an IntelliJ Run/Debug configuration for each service for which you have made a recording as described in Replaying a recording, and start a debugger session for each service as described under Time travel debugging in IntelliJ.

Note

A current limitation is that there should be exactly two such sessions loaded in IntelliJ.

Use breakpoints, watchpoints and the forward and reverse execution buttons to step forwards and backwards within an individual service’s execution history.

Use the Step Across and Step Across Back buttons to step between the services. Step Across jumps to the point in time in another service’s execution history when it received the next API call made by this service. Step Across Back jumps to the point in time in another service’s execution history when it last made an API call to this service.

In order to end up at exactly the right point in the other service, LiveRecorder uses the Span ID generated by OpenTelemetry / Spring Cloud Sleuth if it’s present in the API request. If no Span ID is present, LiveRecorder falls back to using the system time.

You can also go directly to the method invocation associated with a Span ID by entering the id into the Log Jump panel. You only need to load a single recording for this mode of operation. The Span ID can be obtained from an OpenTelemetry collector.

The Step Across and Log Jump mechanisms have been tested with both OpenTelemetry (which uses the traceparent http header) and Spring Cloud Sleuth (which uses the X-B3-SpanId http header).