Tracing Distributed Microservices

Return to site

Tracing Distributed Microservices

=>Implementing APM (Application Performance Monitoring) for the JVM

· mashup,java

I INVENTORY: "Where we are coming from?"

=>"Distributed monolith".

With java EE we had modules talking to each other.
And a request coming from the web would be dispatched to node A, then node A would talk to node B and to node C.
If something went wrong with the application, we had server dashboards showing applications health, letting observe clusters thanks to APIs that are observable.
Sure, it is not what we are doing today, but the idea was actually pretty good: you could observe what went wrong with the software.
But we move of these dashboards, because it was perceived as hard to work with.

=>"Inferred tracing"

With these dashboards, if something was slow, we could see exactly what module or enterprise bean was responsible for it.
But we were restricted to use the related standard APIs which limits development to app server's capabilities.
That is a sacrifice freedom in development, while it eases Ops.
(These standard APIs allow server to interpret program semantics).

=>Where we transition to: greenfield "micro-"services.

With greenfield microservices, this kind of monitoring is no longer possible due to heterogeneity of solutions.
For example, if we use VERT.X, DropWizard, Springboot; that might be compiled code and not Java code anymore.
And if a request comes from the web, it calls a service and that service calls another service and so forth.
And if something is slow, now you've got more trouble because now you don't necessary know where the service has a problem.
And this is where distributed tracing comes in.
The idea of distributed tracing is more or less to regain a property of software that we've always had in centralized applications like the Java EE stack.

=>Where we transition to" "The wheel of doom"

The observation of such greenfield microservices has been called the wheel of doom.

Simple bits, complex interaction: death star topologies.
The observation and tracing of service calls is very complex.

=>Where we transition: simples services, complex operations

Writing distributed services can ease development but adds challenges on integration.
Distributed services without standardized APIs cannot easily be observed in interaction.
Distributed services make it harder to collect structured monitoring data.
Distributed (micro-)services require DevOps to successfully run in production.

That orchestration of many applications makes operations challenging and that's where APM tools try to help you.

II TRACING (MICRO-)SERVICES:"How can we use these stacks and have observability on the bottlenecks?"

=>What information we want?

Imagine we have an app like below:

The trace collector will assign transmissive ids till the last layer and output a representation.
In that representation we see that the issue is at the level of the third server.

The downsize of such mechanism is the standardization shared by all layers which goes against the original freestyle purpose of (micro-)services.
This is done through code instrumentation and Java agents.

=>How do we get it?

Zipkin is a solution to get the traces.
The trace being a response composed of spans.
A span is a segment of microservice chain.
All the spans part of the chain sharing ids, are called the trace.
So a span is a single segment of a single service.
A trace is the collection of spans that belong together forming a single transaction in the microservices system.
The job of the tracer is to show up the relevant traces (where there is deviation).

Concretely, you record a span this way:

Several competing APIs:
1. Most popular are Zipkin (core) and Brave.
2. Some librairies such as Finagle (RPC) offer built-in Zipkin-compatible tracing.
3. Many plugins exist to add tracing as a drop-in to several librairies.
4. Multiple APIs exist for different non-JVM languages.
But how to handle such varieties of tracing solutions?

=>A standard to rescue

That's where OpenTracing enters the show.
With this syntax:

But there's still a heck: what happens if the logs are not compliant with OpenTracing.
So having such standard is kinda a lost course (too costly and unreliable).
Because of these problems:
1. Single missing element in chain breaks entire trace
2. Requires explicit hand-over on every context switch. (Span typically stored in thread-local storage).

III IMPLEMENTING APM: "So how can we implement APM on the JVM?"

=>Use a Java agent

The JVM has an API that's called the instrumentation API which allows you to create a JAR file that is loaded before the actual program.
This jar manifest points to a main method that points to a premain method of two arguments: a String and an Instrumentation object which does the tracing.
Here the code of such agent:

This bootstraps the bytecode of the class with trace-information rather than waiting the dev team to add tracing from the inside of the class.
This is the best approach.

More in Java Champions conference talk of Rafael Winterhalter: https://youtu.be/HZq7vqZ5p8A

Video