Rewriting Bash scripts in Go using black box testing

Testing is an integral a part of any software, and writing automated exams is important to making sure the security of your code. However what do you do if you’re rewriting a program in a completely completely different language? How do you make sure that your new and outdated program do the identical factor?

On this article I’m going to explain a journey we took to vary a set of Bash scripts right into a well-organized Go library, and the way we made certain that nothing broke alongside the way in which.

At first…

Right here at Flipp we’ve our personal microservice platform, which permits us to bundle up and deploy our code as a part of our steady supply pipeline. We offer further performance as effectively, like validations for permissions (so we don’t deploy one thing that’ll fail as a result of it doesn’t have permissions to learn or write a useful resource).

These scripts had been written in Bash. Bash is an effective way to work together with Linux executables. It’s actually quick and doesn’t want any specific programming language put in since there isn’t a compilation or interpretation previous the shell itself. However Bash is finicky to work with, arduous to check, and doesn’t have a “commonplace library” like most programming languages.

We in all probability may have stored chugging with Bash. Nevertheless, the primary ache level that stored coming again was the usage of setting variables for just about the whole lot. Once you’re writing Bash, there isn’t a means of telling whether or not a specific setting variable is an enter to your script, whether or not it’s one thing that simply occurred to be set attributable to some exterior course of, or if it’s a native variable that your script ought to “personal.”

As well as, for those who separate your scripts into recordsdata so your platform can use them, you don’t have any means of stopping any person from calling little bits of it with out telling you. If you wish to attempt to deprecate a function, it’s nearly unimaginable to do since the whole lot is open to the world.

When making any form of important change, our solely alternative was principally to place out a brand new model, “take a look at it in manufacturing,” and advertise to the default model after these exams are full. This labored high quality once we had a few dozen providers that used these scripts. As soon as we ballooned to a number of hundred, it grew to become much less best.

We needed to rewrite these items in order that they had been truly maintainable.


We determined to rewrite the scripts in Go. Not solely was it a language we had been already utilizing at Flipp for high-throughput API providers, it permits simple compilation to no matter goal structure we would have liked and comes with some nice command-line libraries like Cobra. As well as, it meant we may outline the conduct of our deploys with config recordsdata versus big messes of setting variables.

The query remained although — how will we take a look at this factor? Unit and have exams are usually used earlier than refactoring so you possibly can make sure that your inputs and outputs match. However we had been speaking about rewriting this in a completely completely different language. How may we ensure we’re not breaking issues?

We determined to take a three-step method:

  1. Describe the conduct of the present scripts by having a take a look at framework cowl all present circumstances.
  2. Rewrite the scripts in Go in such a means that all present exams nonetheless cross. Write it in such a means that it may take both setting variables or config recordsdata as its enter.
  3. Refactor and replace the Go library in order that we will benefit from the pliability and energy of a programming language. Change or add exams as mandatory.

Describing conduct: Introducing Bats

Bats is a testing framework for Bash. It’s pretty easy — it offers a take a look at harness, setup and teardown capabilities, and a approach to arrange your exams by file. Utilizing Bats, you possibly can run actually any command and supply expectations on exit codes, output, setting variables, file contents, and many others.

That is the first step to making a approach to take a look at our present scripts. But it surely doesn’t fairly go far sufficient. One of many important items of automated testing is the power to stub or mock performance. In our case, we didn’t need to truly name out to the docker or curl instructions, or to do any actual deploys, as a part of our testing framework.

The important thing right here is to manipulate the shell path to direct any invocations to “dummy” scripts. These scripts can examine inputs to the command, in addition to setting variables that might be set throughout take a look at setup, and print their output to a file which will be inspected after the take a look at runs. A pattern dummy script may appear like this:

#! /bin/bash
echo -e "docker $*" >> "$CALL_DIR/docker.calls"

if [[ "$*" == *--version* ]]; then
 echo "Docker model 20.10.5, construct 55c4c88"
if [[ $1 == "build" && "$*" == *docker-image-fail* ]]; then
 exit 1

A pattern output file after a script run may look one thing like this:

# docker.calls
docker construct --pull -f methods/my-service/Dockerfile
docker push

The final piece is to introduce snapshot testing to our primary take a look at harness. This implies we save these dummy output recordsdata, plus the precise command output, to recordsdata that dwell contained in the repo. The order of operations is one thing like this:

  • The brand new take a look at is written. At this level, we don’t have any output recordsdata.
  • The brand new take a look at is run with the UPDATE_SNAPSHOTS setting variable set. This protects the dummy output recordsdata and command output to an outputs listing for the folder we’re working in. These get dedicated to the repo.
  • Once we re-run exams, the output is saved to a current_calls listing inside the identical folder.
  • After the command is finished, we name a script that compares the contents of the outputs listing with that of the current_calls listing. 
  • If the output is similar, it studies success and deletes the current_calls listing.
  • If there are variations, it studies failure and leaves the current_calls listing the place it’s so we will examine it and use diff instruments.

Describing conduct: Gotchas

There have been a few issues we obtained bitten by that we needed to repair with the intention to get this to work proper on our steady integration pipeline:

  • Bash scripts run on no matter laptop is working it, that means that the present listing is likely to be completely different between your machine and the CI pipeline’s machine. Due to this, we needed to seek for the present listing in all output recordsdata and change it with %%BASE_DIR%% . This ensures that the output to be in contrast is at all times similar no matter the place it’s run.
  • Some instructions output coloured textual content utilizing the e directive. This ends in barely completely different textual content saved to the output recordsdata on Mac versus Linux, so we needed to do some discover/changing right here as effectively.
  • We needed to have a approach to examine if the referred to as command truly failed — generally we count on it to fail, and the take a look at itself ought to fail if the failure didn’t occur as anticipated. In our case we needed to run fairly a little bit of code each earlier than and after the command underneath take a look at so the exit code was not obtainable to the Bats take a look at file. Due to this, we needed to set an setting variable indicating whether or not we anticipated the present command to fail or not.
  • We wished to maintain the precise take a look at recordsdata concise to keep away from attainable handbook errors. In our shared take a look at code, we decided the suite folder from the title of the take a look at file and in lots of circumstances, the take a look at was nothing however a single line with the title of the folder inside that suite to check.

Describing conduct: Devising the exams

The slog now started. Primarily, each if and loop assertion in our Bash scripts represented one other take a look at case. In some circumstances it was apparent {that a} code department was pretty remoted and might be coated by a single case. In others, the branches may work together in bizarre methods. This meant that we needed to painstakingly generate take a look at circumstances in a multiplicative method.

For instance, we would have liked to have a take a look at for when a single service was being deployed versus a number of providers and in addition when solely the primary deploy step was being run versus the total workflow. On this case this meant 4 impartial take a look at suites.

This step in all probability took the longest! When it was completed, we may safely say we had described the conduct of our present deploy scripts. We had been now able to rewrite it.

The rewrite: Let’s go together with Go!

Within the first iteration of the Go library, we intentionally referred to as out to Bash (on this case, utilizing the go-sh library) each time we wished to do one thing exterior, like net requests. This fashion, our first Go model was utterly similar to the Bash model, together with the way it interacted with exterior instructions. 

Refactor and replace: Realizing wins

As soon as all exams had been passing utilizing this model, we may begin having it act extra like a Go program. For instance, relatively than calling curl immediately and coping with its cumbersome means of checking HTTP statuses, it made much more sense to make use of Go’s built-in HTTP capabilities to make the request.

Nevertheless, as quickly as we stopped calling a command immediately, we not had our present exams confirm our conduct! The outputs relied on the existence of these dummy scripts, and we had stopped calling them.

We didn’t truly need our take a look at outputs to be similar to the outdated ones — the curl outputs particularly had been so convoluted that we wouldn’t have any “wins” if we wrangled the Go code into one way or the other outputting one thing that regarded prefer it got here from a Bash curl name.

To get previous this level, we needed to take three steps.

  • We needed to write a library that wrapped the instructions as they had been being referred to as proper now. For instance, a operate that took a URL, a technique, POST information, and many others. At this level, it nonetheless calls the curl command.
  • We then change it  to make use of the Go HTTP capabilities as an alternative of curl. We write unit exams round that library so we all know it at the very least calls the HTTP capabilities appropriately. We even have the “mock” model of this library write to its personal output file, much like how our “dummy” scripts work.
  • Lastly, we reran the snapshots for the exams. At this level, we needed to do handbook work to check the outputs of the unique curl.calls file and the brand new requests.calls file to validate that they had been semantically similar. Utilizing diff instruments, it grew to become fairly simple to inform visually when issues had been similar and after they weren’t.

In different phrases, though this step misplaced us our armor-plated certainty that nothing had modified, we had been capable of pinpoint our change in order that we knew that every one the diffs had been associated to this one change and will visually affirm that it labored.


We did nonetheless should do some handbook testing attributable to setting adjustments, however the finish end result labored rather well. We had been capable of change our deployment scripts with the Go model for all new providers and had been even capable of develop a script to automate pull requests for all present providers (utilizing multi-gitter) to permit groups to maneuver to the brand new model after they had been capable of.

This was a cautious and lengthy journey, however it helped immensely to get the place we had been going. You may see an edited model of the scripts we utilized in this gist!

Tags: bash, go, porting, testing

More Posts