Create Log-Based Metrics in Google Cloud and Gain Valuable Insights | by Mirco | May, 2022

Use Terraform to create your metrics and deepen the understanding of your system

Instruments of an airplane cockpit.
Photograph by Mael BALLAND on Unsplash

It’s important to know the present state of your system. Is there an uncommon improve in errors? Is the load regular, or do you expertise visitors spikes? What’s the latency? You would possibly oversee issues on the horizon for those who miss that data. For instance, a sudden improve in 401 or 403 errors would possibly level you to somebody who’s attempting to interrupt into your system.

You get many issues out of the field on the Google Cloud Platform. There are a whole lot of predefined metrics. Nonetheless, not even Google could foresee each metric you want. Due to this, you possibly can create your personal metrics.

Log-based metrics extract info from log messages. It could possibly ingest every kind of logs, even of companies you don’t personal. This lets you create quite a lot of customized metrics.

Two kinds of log-based metrics exist:

  • A counter merely counts the occurrences of log messages. They depend how typically one thing occurs in your system, for instance, how typically your service publishes a message.
  • Distributions associates every log message with a worth. For instance, they seek for a log message which states how lengthy a course of ran. You should utilize this to seek out out the typical runtime, outliers, and comparable.

Log-based metrics solely use log messages which had been logged after we created the metric.

We are going to use Terraform to create each varieties. Suppose we’ve an software that prints log messages like:

Bob logged in.
Eve logged in.
Alice logged in.
Init took 5 seconds.
Init took 3 seconds.
CleanUp took 5 seconds.
CleanUp took 13 seconds.

We are going to use the previous logs to create a counter and the latter to create a distribution.

We wish to depend how typically customers log in. Allow us to have a look at the log messages.

Bob logged in.
Eve logged in.
Alice logged in.

As acknowledged above, a Counter could solely depend the variety of occurrences of log messages. Clearly, creating one counter for every person wouldn’t make any sense.

The standard method of fixing this downside is to create one metric and add labels to it. The labels assist to see which information level belongs to which person login. You are able to do the identical with log-based metrics. With this in thoughts, the message we wish to depend appears extra like this:

$username logged in.

The next Terraform useful resource provides this metric to your undertaking:

How does it work?

  • On line 4, we outline which log messages to contemplate. We solely need messages logged by CloudRun which include the string “logged in.”
  • Line 6 and seven inform Google Cloud we wish to create a counter.
  • Line 9 and 10 outline the username label I discussed earlier. It’s of sort string. Different values are INT64 and BOOL. Curiously, floating-point values aren’t attainable.
  • Line 15 defines how the to extract the values for the label username. The metric evaluates the common expression in opposition to every log message. Warning! The log message could also be in one other subject, for instance jsonPayload.message. This will depend on how your software writes log messages. The worth of the seize group turns into the worth of the label. Subsequently, you possibly can solely outline one seize group.

That’s it for the counter! We use the Metrics Explorer to have a look at it. The metrics title is logging/person/login.

Counter metric displayed in Metrics explorer.
The customized login metric.

We chosen our metric (1) and grouped it by the label username (2). Now we all know Bob logged in 27 occasions within the final hour (3).

Counters turn out to be useful in lots of conditions. Nonetheless, they’re restricted. If you wish to acquire details about issues like job run time in seconds, they won’t be of a lot assist. You might add a label that captures the job run time, however you can not calculate issues like percentiles or the imply.

Distributions are the appropriate match for this type of activity.

Bear in mind what the applying logs had been:

Init took 5 seconds.
Init took 3 seconds.
CleanUp took 5 seconds.
CleanUp took 13 seconds.

This time, we wish to seek for this sample:

$operation took $worth seconds.

Allow us to have a look at the Terraform useful resource for this metric:

It’s extra complicated than the counter, however we additionally see similarities.

  • Line 4 comprises the filter once more. This time, we wish our messages to include the phrases “took” and “seconds.”
  • Line 6 to eight inform Google Cloud to create a distribution metric the place the values are in seconds.
  • Line 9 to 12 outline the label for the operation title, simply as with the username. Line 15 to 17 include the corresponding extractor.
  • There may be an attention-grabbing half in Line 14. It appears just like the label_extractor beneath and works equally. The value_extractor will get the worth (run time in seconds) from the log message.
  • Line 18 to 23 outline the best way to group the log entries. We outline ten buckets. The primary bucket comprises all log entries the place the job took about 0 to 1 seconds; the fifth bucket comprises all the place the job took 4 to five seconds. If wanted, you need to use a extra refined method, like exponential buckets, if wanted.

Open the Metrics Explorer to see the warmth map for the operation CleanUp.

Warmth map for the CleanUp operation.

We chosen our metric (1). Add a filter to see the warmth map for one operation (2). Now we will see at (3) that the operation took 10 to 11 seconds for about 19% of all invocations.

It’s very easy to arrange log-based metrics. They’re particularly helpful if you might want to monitor one thing you do not need direct management over. Nonetheless, use them (and different kinds of metrics) with care. You can not separate a sign from the noise for those who measure an excessive amount of.

More Posts