Introducing Jaeger Quick Start — Deploying on AWS | by Dmitry Kolomiets | Apr, 2022

A have a look at the end-to-end distributed tracing platform

Picture by Markus Spiske on Unsplash

In the previous post, we checked out AWS Distro for OpenTelemetry (ADOT) and methods to use it to supply traces in event-driven architectures. ADOT exports the traces to AWS X-Ray by default — sadly, there are some limitations:

  • X-Ray doesn’t assist OpenTelemetry events — a really helpful idea that in lots of instances could exchange extra conventional logs
  • X-Ray doesn’t assist OpenTelemetry links — we’ve seen beforehand that hyperlinks are very useful for expressing causal relationships between spans
  • X-Ray has limitations on span kinds — solely spans of variety Server are transformed into X-Ray segments

As is usually the case in OpenTelemetry world, it’s doable to alter the default configuration and export the traces to a totally different tracing backend. However which one to make use of?

There are a lot of tracing backends you might need to contemplate on your tracing platform — each business and open supply. On this publish, I’d wish to deliver your consideration to Jaeger — a graduated project from Cloud Native Computing Foundation (CNCF). This can be a respectable stamp of approval, contemplating different CNCF graduated initiatives. Simply to call just a few: Kubernetes, Helm, Prometheus, Fluentd.

Jaeger is an open-source, end-to-end distributed tracing platform. In a real cloud-native trend, Jaeger may be deployed in various methods, for instance:

Jaeger crew created an in depth deployment guide that solutions many questions on the roles of various Jaeger elements (Agent, Collector, and Query companies), integration factors with different companies (Elasticsearch, Cassandra), and scaling issues. Nevertheless, even with that stage of technical documentation out there, deploying a production-ready Jaeger platform isn’t a small feat.

For big-scale deployments, particularly within the AWS context, we anticipate quite a bit from a well-architected answer — amongst different issues:

  • Resiliency
  • Elasticity
  • Observability
  • Native integration with different AWS companies
  • Reproducible, totally automated IaC deployments

Jaeger Kubernetes operator helps with a few of these traits, however then once more — you want a Kubernetes cluster to begin with. What when you don’t have one? Spinning up (and sustaining!) a brand new cluster to host a tracing platform could also be means too tough and costly (each cost- and experience-wise).

Right here goes the primary query for this publish — what would a Jaeger “fast begin” appear to be on AWS?

I used “fast begin” above for a purpose — AWS Quick Starts are “automated reference deployments … [that] show you how to deploy common applied sciences on AWS in line with AWS greatest practices”. That is precisely what I’m aiming for with Jaeger on AWS.

AWS even gives Quick Start Contributor’s Guide to assist with the authoring of the brand new Fast Begins. With out diving too deep into the main points, let me say {that a} Fast Begin is a set of CloudFormation templates.

And I have to admit that these templates need to be strong. Quoting the documentation, the minimum architectural requirements for a Fast Begin are:

  • Multi-AZ structure (details)
  • Assist for almost all of AWS Areas (details)
  • New VPC and current VPC deployment choices (details)
  • Product situations in personal subnets (details)
  • NAT gateways for outbound web entry from personal subnets (details)
  • Market AMIs each time doable; no prebaked AMIs (details)
  • AMI mappings; no hardcoded AMIs (details)
  • Consumer-friendly parameter labels and teams (details)
  • CIDR block lockdown for exterior admin entry (details)
  • Safety teams with precept of least privilege (details)
  • No software program bits with deployment (details)
  • No hardcoded passwords (details)
  • No delicate knowledge in EC2 occasion person knowledge or different clear textual content (details)
  • No use of 0.0.0.0/0 for open distant administration entry (details)
  • No assets created mechanically outdoors stack (details)

A powerful checklist certainly. In case you use any Fast Begin to deploy your infrastructure, you may ensure that templates are fairly good.

One other factor I particularly like in regards to the Fast Begins ecosystem is modularity. If there’s a Fast Begin that deploys part of the infrastructure you need to use in your structure (resembling VPC Quick Start or Aurora PostgreSQL Quick Start) — you may reference an current implementation. It’s like referencing a library in Python or NodeJS, solely on your CloudFormation templates. This function alone is groundbreaking sufficient to alter your method to IaC with CloudFormation.

Let’s be sincere — what number of instances have you ever created a template to provision VPC, subnets, route tables, web gateways, NAT gateways, endpoints, …? Don’t inform me you haven’t dreamed about off-the-shelf implementation you might simply reuse with out reinventing this wheel once more? Now you may.

A ultimate factor about Fast Begins (promise). Fast Begin architectures may be very complicated, particularly when you contemplate the modularity side we coated above. Creating and sustaining CloudFormation templates for a big Fast Begin could also be a critical process with out a strong automated testing framework. AWS Fast Begin crew acknowledged this problem and launched TaskCat:

TaskCat automates linting and deployments for CloudFormation templates throughout a number of AWS areas. It means which you can be sure that your templates, nonetheless complicated they’re, may be efficiently deployed within the goal areas (and deleted— this side is usually uncared for throughout CloudFormation growth)

With TaskCat it’s doable to outline “exams”, offering totally different parameters to CloudFormation at deployment time. The instrument additionally generates a cross/fail report that may be built-in into your CI/CD instrument, so you may run the exams as a part of your PR validation course of.

Even in case you are not going to make use of AWS Fast Begins on your IaC, I do advocate incorporating TaskCat into your workflow to automate exams on your CloudFormation templates.

NOTE: That is NOT an official AWS Quick Start. The undertaking is constructed in line with Quick Start Contributor’s Guide however it’s NOT authorised by AWS and Jaeger groups but.

Now, as soon as we outlined what an AWS Fast Begin is, let me lastly introduce Jaeger Quick Start!

Jaeger Fast Begin lets you deploy a highly available Jaeger — open-source, end-to-end distributed tracing platform on the Amazon Net Providers (AWS) Cloud.

You should utilize the AWS CloudFormation templates included with the Fast Begin to deploy Jaeger in your AWS account in about 10–half-hour.

This Fast Begin is for customers who desire a repeatable, customizable reference deployment for Jaeger utilizing AWS CloudFormation. You may as well use the offered AWS CloudFormation templates as a place to begin on your personal implementation.

Here’s a high-level structure diagram of the Fast Begin:

Jaeger serverless structure (with Elasticsearch backend)

The principle highlights:

  • As an alternative of operating Jaeger containers within the Kubernetes cluster (with related operational complexity), the Fast Begin makes use of AWS Fargate — serverless compute engine
  • The Fast Begin deploys all the required Jaeger elements (Collector, Query, Dependency Spark Job) and configures them appropriately
  • The Fast Begin helps two totally different storage backends — Amazon OpenSearch cluster for large-scale manufacturing deployments and in-memory storage appropriate for experiments and proof-of-concept workloads
  • The Fast Begin deploys Network Load Balancer in public VPC subnets to distribute the visitors and expose vital Jaeger ports. The load balancer may be both internet-facing or inside, accessible solely to VPC purchasers
  • The Fast Begin gives choices to assign DNS names for Jaeger occasion. Each Public and Personal Amazon Route 53 hosted zones are supported.
  • The Fast Begin integrates with AWS Certificate Manager so as to add TLS to the Community Load Balancer
  • The Fast Begin helps integration with Amazon Managed Service for Prometheus to show Jaeger metrics
  • The Fast Begin provides an elective integration with CloudWatch Container Insights to watch metrics for the Jaeger containers operating on Fargate

An in depth clarification of the options, architectural choices, and full parameter reference may be present in Jaeger Quick Start Deployment Guide.

Jaeger Quick Start is a brand new undertaking — for the preliminary model I targeting supporting two storage backends (OpenSearch and in-memory) and, due to this fact, barely totally different architectures. I’ve put collectively Jaeger Quick Start roadmap and raised various GitHub issues to point the place I need to go subsequent. My focus for the following milestone could be efficiency evaluation, fine-tuning of the default parameters, and documentation enhancements. Ultimately, I plan so as to add assist for one more common Jaeger storage backend — Cassandra or, in AWS world of managed companies, Amazon Keyspaces.

When you have expertise with deploying and operating Jaeger on AWS or need to contribute in any means— please do attain out. Anecdotes and any ache factors you encountered are particularly welcomed!

Earlier than I wrap the publish up although, let me put Jaeger Fast Begin in a wider context of this series, particularly the way it could possibly be helpful for our OpenTelemetry journey with event-driven architectures?

With a easy option to deploy Jaeger into AWS setting, you may complement (and even exchange) AWS X-Ray as your goal telemetry backend for utility traces. Certainly, you need to use Jaeger and AWS X-Ray collectively, run them aspect by aspect— in truth, I might actually advocate this method to really see each platforms in motion and make an informed choice about which one fits your wants greatest.

And that is precisely what we’re going to cowl within the subsequent publish — operating each platforms aspect by aspect, discussing ADOT configuration adjustments essential to assist Jaeger, evaluating the ensuing traces, and outlining the primary variations between the platforms.

More Posts