A story of two Collectors
To this point I’ve been speaking about completely different features of running OpenTelemetry in distributed architectures and launched Jaeger Quick Start to simplify the deployment of this tracing platform within the AWS atmosphere. Now it’s time to carry every little thing collectively and reveal an entire instance of utilizing AWS Distro for OpenTelemetry (ADOT) and Jaeger as a tracing backend.
It’s lengthy overdue to obviously state the objective for this series of posts. Higher late than by no means although.
I need to persuade you that tracing is a sensible and invaluable idea, not simply theoretically attention-grabbing. I’m approaching this from DevOpsy angle of infrastructure deployment and preliminary configuration — the boring subjects we have to settle earlier than we transfer on to extra helpful issues.
Particularly, I’m aiming for the viewers that’s conversant in AWS atmosphere and has some expertise with AWS X-Ray — the official tracing providing from AWS. By means of OpenTelemetry and different open-source merchandise like Jaeger, I reveal that you’ve got an choice to judge completely different tracing platforms. I’m arguing that by adopting OpenTelemetry you’ll be able to even run a number of tracing platforms facet by facet, inspecting them and making a aware determination about what works greatest on your group.
Earlier than we transfer on any additional, a fast reminder of the OpenTelemetry Collector structure (for in-depth dialogue see Approaching OpenTelemetry):
OpenTelemetry Collector is the principle part that ensures telemetry signals (traces, metrics, logs) are received, processed, and exported. As an OpenTelemetry consumer, you might be answerable for Collector pipelines — you’ll be able to choose from a registry of obtainable receivers, processors, and exporters to construct the precise pipelines you want.
OpenTelemetry defines a listing of recommended processors — price having a look.
Jaeger structure can also be constructed round Collector. Comparable names will not be a coincidence right here — Jaeger founder, Yuri Shkuro can also be an OpenTelemetry co-founder and an energetic member of the neighborhood. Jaeger Collector can obtain hint spans in various formats and handles environment friendly storage of the information, utilizing one of many supported storage backends — on the time of writing these are Elasticsearch, Cassandra, and Kafka.
Bringing all of it collectively, when the OpenTelemetry pipeline is configured to export traces to the Jaeger platform, the hint movement appears to be like like this:
At this level you might ask a sound query — why do we want one other Collector? Recall that this can be a large deal operationally — take into consideration deploying, scaling, patching, monitoring, and troubleshooting. Wouldn’t or not it’s simpler if we export traces to Jaeger storage backends immediately from OpenTelemetry Collector?
Jaeger was created earlier than OpenTelemetry. This explains why we now have Jaeger Collector however shouldn’t have (but) OpenTelemetry Exporters that help Jaeger storage backends.
In reality, that is precisely the path Jaeger workforce is thinking about. Nevertheless, we aren’t there but and each Collectors must be current. It could look like a setback, however let’s go one step additional to grasp how AWS Distro for OpenTelemetry matches in.
The referenced GitHub issue is a really attention-grabbing learn that helps to higher perceive relationships between OpenTelemetry and Jaeger in addition to function overlap between them.
As we mentioned above, OpenTelemetry Collector might be configured to export the traces to Jaeger Collector by way of the configuration file utilizing the Jaeger exporter. Can the identical be finished with ADOT?
In principle, it needs to be the identical factor — ADOT is simply an OpenTelemetry distribution, tailor-made for the AWS atmosphere. Proper?..
Not fairly. Do not forget that an OpenTelemetry distribution can each add and take away sure OpenTelemetry parts and ADOT just isn’t an exception. It provides AWS-specific components but it surely additionally removes some from the bottom distro, primarily to maintain the distro compact and cut back the testing floor. What’s necessary for right this moment’s publish — there isn’t any Jaeger exporter out there in AWS Distro for OpenTelemetry.
Although there is Jaeger exporter for OpenTelemetry out there, it isn’t included into ADOT (but?) and, subsequently, there isn’t any built-in technique to configure ADOT-based purposes to export spans to Jaeger.
It means we are able to’t immediately construct ADOT -> Jaeger hint processing pipeline. It could be disappointing while you first encounter this, however I’m going to argue under that that is really not a (very) dangerous factor.
There are a number of methods to deal with this impediment and export OpenTelemetry traces from the ADOT-enabled utility to Jaeger:
- OpenTelemetry Collector in entrance of Jaeger
- Customized ADOT Collector packaged with Jaeger exporter
This publish focuses totally on the primary method as it may be used with the most recent OTEL Collector releases, together with all processor/exporters from the opentelemetry-collector-contrib repository. We’ll cowl the method of constructing a customized ADOT Collector in future posts.
As a substitute of making an attempt to export spans to Jaeger immediately from ADOT, let’s introduce one other OpenTelemetry Collector in entrance of the Jaeger platform:
The structure above might really feel incorrect — now we now have two OpenTelemetry Collectors, seemingly simply to work round the truth that ADOT doesn’t embrace Jaeger exporter. It could be certainly the case if in case you have a single service — however what if in case you have tens or a whole lot of them?
Abruptly, the structure with OpenTelemetry Collector deployed as a gateway appears to be like rather more smart. Gateway Collector is a technique to standardize your telemetry processing pipelines, as an alternative of replicating the identical configuration time and again in every service. With this structure in place, ADOT Collectors act as “dumb” agents, forwarding telemetry indicators to the gateway Collector that in flip carry out the principle processing.
Here’s what you are able to do with this structure in place:
- Implement superior sampling insurance policies throughout your distributed structure, together with tail-based sampling
- Capture metrics primarily based on the spans collected throughout your providers. That is particularly invaluable as you’ll be able to seize metrics primarily based on the spans that will likely be dropped by your sampling guidelines.
- Centralize export of telemetry indicators (together with secret administration) to exterior backends — similar to Jaeger or DataDog
- Environment friendly scaling — ADOT Collector is working near service and, subsequently, is meant to be very light-weight to make sure that utility efficiency just isn’t affected by tracing. In distinction, gateway OpenTelemetry Collector might be deployed and scaled independently, primarily based on the full telemetry quantity throughout all providers
- Gateway Collector is a technique to implement org-wide safety/networking controls and decouple telemetry producers (often, service groups) from telemetry administration (platform workforce).
The plain drawback of this structure — it provides one other hop to the telemetry pipeline and makes it extra advanced, particularly from the operational perspective.
I might argue that the benefits you get drastically outweigh the disadvantages the extra providers you could have in your distributed structure.
I hear you sigh deeply and reluctantly matter:
OK. That gateway factor is smart. I might do this out and examine Jaeger with X-Ray. Nevertheless, it seems like this requires a number of work and I’m not prepared to take a position into this proper now.
Within the previous post, I launched Jaeger Quick Start mission that addresses this use case. The Fast Begin automates the preliminary Jaeger deployment and configuration on AWS with affordable defaults, so you’ll be able to “begin shortly”. For the needs of this weblog publish, Jaeger Fast Begin gives an choice to deploy not solely Jaeger itself but in addition a pre-configured OpenTelemetry Collector in entrance of it:
This really signifies that after you have Jaeger Fast Begin deployed in your atmosphere, you should utilize it immediately — no code modifications in your ADOT providers are essential (yay!). All that’s wanted is to regulate the ADOT Collector configuration to ahead the traces to the gateway Collector in entrance of Jaeger.
For the demonstration a part of this publish I’m going to make use of a easy event-driven structure we coated intimately in the previous post:
The entire supply code of this utility together with the mandatory deployment directions might be present in kolomiets/tracing-playground repository on GitHub.
As we’re utilizing ADOT Collector within the demo utility, we are able to use OPENTELEMETRY_COLLECTOR_CONFIG_FILE atmosphere variable to override the default Collector configuration and add a customized one with the hint processing pipelines adjusted (examine this with the default config—the modifications are marginal):
With this configuration in place, we export the identical traces to each AWS X-Ray and Jaeger backends. That is very helpful to judge each platforms.
We use environment variable expansion for the OTLP endpoint within the configuration above. It is a neat method to maintain your Collector configuration recordsdata easy and static. Growth additionally permits you to reconfigure the endpoint by updating JAEGER_OTLP_ENDPOINT variable — no want to vary the configuration file and redeploy the entire service.
Now, with all of the configuration bits out of the way in which, we are able to run our demo utility (see the instructions in GitHub repo) and at last see the traces in each AWS X-Ray and Jaeger. Let’s begin with AWS X-Ray:
That is the acquainted hint we’ve seen once we examined context propagation with OpenTelemetry. Nothing notably new right here. Let’s change on to Jaeger:
Jaeger reveals a variety of captured traces — be aware completely different color codes for providers concerned. If we dive deeper and open a hint, we’ll get span particulars:
Every span comprises OpenTelemetry span attributes — the principle supply of details about a span. For instance, listed below are the attributes for Kinesis.PutRecord span:
Jaeger has an analogue of AWS X-Ray Service Map function out there at System Structure tab:
The numbers on the sides present the full variety of traces captured alongside the sting
It’s possible you’ll discover that Jaeger’s system structure is completely different from what we’ve seen in AWS X-Ray. Nodes for the lambda capabilities are current however we don’t see the nodes for AWS providers: Lambda, SQS, Kinesis, SNS. It appears to be like like we’ve been capable of seize all of the spans explicitly created within the lambda capabilities, however there aren’t any spans emitted by AWS providers. What’s going on right here?
The reply lies in how AWS providers are built-in with AWS X-Ray. As talked about in the documentation, some providers (similar to AWS Lambda or Amazon API Gateway) add extra nodes to the service map. In different phrases, there’s built-in integration between sure AWS providers and AWS X-Ray and OpenTelemetry can’t assist us with these extra spans. For this reason we don’t see them in Jaeger (as Jaeger receives solely the spans that OpenTelemetry Collector will get).
This isn’t the one distinction between the traces captured by AWS X-Ray and Jaeger, however most likely probably the most seen one. I’d wish to pause the dialogue right here and defer additional evaluation to the subsequent publish — this can be a large subject by itself.
Earlier than we wrap up, let me summarize the principle concepts we’ve coated on this publish.
OpenTelemetry Collector in a gateway mode is an efficient sample to contemplate, particularly for techniques with numerous providers. It is a good technique to standardize and enrich your telemetry processing pipelines, decreasing telemetry overhead for every particular person service.
AWS Distro for OpenTelemetry might be configured to export the traces not solely to AWS X-Ray (which is the default behaviour) however to different backends as properly. This allows you to run a number of tracing backends facet by facet — invaluable for migrations, evaluations, and POCs.
Jaeger Quick Start simplifies provisioning of the Jaeger telemetry platform in your AWS atmosphere, able to be plugged into your OpenTelemetry pipelines.
Traces captured by ADOT and exported to Jaeger are completely different from the traces you observe in AWS X-Ray. Partially that is due to direct integrations between certain AWS services and AWS X-Ray. The evaluation of those variations will likely be coated in additional posts of this collection.
And, lastly, the principle achievement — with all of the groundwork coated and codified in Jaeger Quick Start and demo applications, I cease speaking about deploying and configuring Jaeger 🙂 Within the subsequent posts, we’ll speak concerning the traces themselves and the advantages that OpenTelemetry gives.