Problems with docker

So far I’ve seen a similar kind of scheme play out three times in different companies:

  1. We need some infrastructure - let’s just use the UI of GCP/AWS/Azure to quickly get the resources we want.
  2. We need to replicate it. But… the person who did the initial setup is off or has forgotten specific steps. Or the UI has changed. Okay, let’s just use infrastructure-as-a-code from now on.
  3. There’s an update to the IaaC tool, and it randomly breaks for people who did (or didn’t) upgrade. Let’s lock it down using some tool-specific tooling - maybe tfenv, or asdf-vm.
  4. But usually we have a host of scripts around that. And now these start breaking. Maybe someone uses macOS and has a different sed? Usually, at this point, people give up and turn to docker, VMs with locked-down provisioning script (vagrant) or a single "god-host" - managed by a single process. Like puppet or chef.
  5. Someone comes after a couple of years and asks why we are still running terraform 0.13ish. Uhm…

The same thing happened in Avocode (and I was the one asking questions). There was a single Dockerfile, which locked everything down to a single version by hand.

Why it didn't work

Two main reasons: macOS on ARM (that was the catalyst for change) and the Terraform upgrade (that was the goal).

The catalyst was that running the amd64 Docker container on ARM macOS was just painfully slow - especially when each deployment meant rebuilding a couple dozen Helm charts. Of course, we couldn’t just switch to native Docker, since that Dockerfile just wouldn’t work and the magician who wrote it wasn’t there to fix it.

The goal was to be able to upgrade as new packages were released (Terraform 1.x and Helm 3.x). Ideally, this shouldn’t mean manually bumping release tag for each package, and bonus points if it could be automated or boiled down to a single command.

Getting better with Nix

At this point, I’ve been running nixOS for a little over half a year and managed to move all my dotfiles to home-manager. And it looked like Nix might be an ideal solution for this situation.

I proposed a three-step process:

  1. Get a nix version of the repo running - same top-level commands, but instead of running them in Docker, run them in nix-shell.
  2. After reaching feature parity and testing everything, remove Docker parts,
  3. Simplify the setup to remove Docker idioms and make it more consistent with nix best practices.

Step 1. was pretty simple. The hardest part was getting old Helm (alongside plugins) and old Terraform running on new nixpkgs. In the end, I decided on two different solutions:

  • for helm - build the old version using new pkgs. The definition from old nixpkgs almost worked - it needed only minor changes to the definition (due to changes in lib) and to build process (due to incompatibilities between the old os library and Go past 1.18),
  • for terraform - pick this package out of old nixpkgs. I was hoping to get this upgraded as soon as we removed Docker,

After that, we had Proof of Concept working for the AWS deployment flow (the one we were primarily using back then). And it worked much faster than docker! The team pushed me to follow up with the next step.

And it was also pretty easy - I just had to add gcloud alongside plugins: shoutout to great google-cloud-sdk.withExtraComponents ([google-cloud-sdk.components.cloud-build-local]) setup in nixpkgs. I also improved UX a little - since now I could pull arbitrary packages and test them out without having to rebuild the whole image each time. After that, Docker was removed pretty much instantly - turned out the team was eager to sacrifice certain features they were not using right now just to get it done. This also means I could finally bump Terraform to the latest version and fix conflicts - meaning we finally could use such features as https://developer.hashicorp.com/terraform/language/functions/sensitive - pretty neat!

Future work

Step three never really happened - I wish I had time to migrate it to Flakes (to really make upgrades as easy as nix flake update), but since then we have started working on an entirely new project, and pretty much abandoned that way of doing things. I’ll describe it in future posts.


Tags: ceros terraform docker puppet infra nix


Copyright © 2025 L Czaplinski
Powered by Cryogen
Theme by KingMob