Deadheading the Cloud

Published on: 2026-03-30 Author: Jon Brookes

An Introduction to a Minimal Viable Guide to Digital Sovereignty and Minimalist Infrastructure

In a spirit of openness and with radical thinking, the "deadheaders" of the band the Grateful Dead took recordings of concerts and shared them freely. This was encouraged by the band members and "deadheaders" are still active in these activities to this day.

Open Source, I believe, has a similar ethos. In my experience as a systems engineer and programmer early on I, perceived infrastructure as something that could by its nature be expressed as code. I sought to 'program the network' - my phrase from back in 1980 something, so as to learn to code in whatever languages or tooling I could find to accomplish this amorphous objective.

If you, like me, followed this path or similar, how did you tackle the question of what happens after you are gone ?

This may sound disturbing for some, but it is not meant to be. Put quite simply, we move on. We take a role, contract, engagement, do some work and when it is time to do so, we move on. What happens to the work we did when we are no longer present?

Now I know you may say, as can I that some systems we built are still running years after we were the ones building them. 10's of years sometimes. More perhaps. But all too often I have experienced ill conceived ideas becoming established practice, old systems dying and being re-written, technical debt being passed to the next recruits.

We need to be asking, can others take over and own the infrastructure that we built, maintained, changed or flexed to suit new requirements? How can this be done? Has a full disaster recovery (DR) rehearsal ever been tested out?

If you have always worked for employers, have you paused for thought as to how you may manage the same infrastructure differently, if it were being paid for from your own bank account?

What of open source, community projects, that others rely on for their own digital sovereignty, how might any of these fair if the worst comes to the worst and some kind of disaster takes them down?

Why I think Rebuild not Upgrades are the answer.

Minimal Viable Kubernetes (MVK) is a part of this thinking, as it is a distillation of an infrastructure in Kubernetes as code. It is certainly not the answer, only in part. But the same questions as those above formed the thinking behind it.

I already use MVK to host my own and clients services and it forms a starting point for deploying a minimal viable infrastructure that can be scaled up or down. The whole infrastructure I know, have proven and can reliably rebuild and restore to as needed.

It is evident to me that by building infrastructure in this way and using code to represent it requires for infrastructure to be:

repeatable
ephemeral
cattle in contrast to that of a a pet

Upgrading a Kubernetes cluster for example is a part of any CKAD exam and is a necessary right of passage to managing K8s. In place upgrades may well be a legitimate part of its management However if the only time you ever build a K8s cluster is the first time, how would you know if you can truly recover it in a disaster recovery scenario? By constantly upgrading in place, have we created a new pet?

The greatest disaster - barring a security breach, power failure, fire, flood and the obvious 'natural disasters' that befall us all - quite simply can be its main operatives moving on.

Rather than waiting for some kind of disaster to occur, rebuilding the cluster, restoring data from the live cluster to then totally fail over to it, 'proves' that our infrastructure as code works.

If carefully managed this could mean minimal down time. It could require zero downtime in some instances. Indeed, restoring data from a good, known state could even mean that we mitigate against 'back doors' left by bad actors.

Granted, there can be no no guarantees that we can mitigate against all security incidents. There is always risk. But what is the risk of never relying on your infrastructure as code to re-build to a clean state, from scratch?

What are the Fears we Face?

I have seen many barriers in place of full DR rehearsals. Some may be a surprise such as:

fear of exposing secrets if checked into code repositories over shadows safe management and store to an appropriate vault or into secrets management. Attempts to build a DR environment quickly fails and is never tried again
infrastructure is complicated and applying overly simplistic management strategies can lead to building technical debt. Simply put, DR becomes impossible without major work.
gitops promises much but delivers not so in a full DR or DR rehearsal scenario. Eventual state over deterministic procedural infrastructure as code is taken as the solution only to find that it can fail to deliver reliable DR
we can start kidding ourselves that everything is fine when, in fact, we have no DR - just a semi-self healing platform

A barrier to any action, particularly in the case of DR, that I have seen and see in forthcoming years is that of 'it won't happen to me' or 'it's so unlikely to happen'. Also stemming from an irrational view that there must be a very complicated solution in place for what is in effect a simple problem.

This leads to procrastination, inaction and can result in nothing being done differently.

No DR process is put into place. DR is never rehearsed.

What are my Answers and Should you Listen?

Nobody likes a smart **** - to quote the "Hitchhikers Guide to the Galaxy" a science fiction novel of the 80's in which Douglas Adams in describes the scientists who invented the "improbability drive". He was right. Nobody likes being told they are wrong, particularly when much time and personal effort is invested into implementing and learning new technologies and patterns for the CV to say things that include Argo, Flux and the like.

Boring is good where simple, procedural code is used to reliably restore and bootstrap an environment that may well go on to be 'managed' by GitOps at a later date. But initially we need to go back to basics in order to recover from a full burn down.

Code can be saved to the cloud, open sourced so that anyone can just fork and carry on even if our code falls into oblivion - but what of data, customer data, secrets, configurations that we need and must keep confidential - what then ?

In the film "Meet the Fockers", a 2004 American romantic comedy film, I am reminded of something called 'the circle of trust' in which a group of people are identified and kept in a 'circle of trust'.

For comedic effect, the main protagonist is the son in law who is to be clearly marked as 'not in the circle of trust', but I hope to make the point. You need a circle of trust where by each of a minimum of 3 members may share access to trusted data.

The simplest way to achieve this I believe is to store secrets and documents in the same form as you would normal code, in public but that that are encrypted using the GPG/PGP keys of those in the circle of trust and then armoured before copying them to S3 buckets ( ideally 3 locations / providers ) from which the circle of trust members have access to.

These need at least to be minimal, viable steps to deadheading an infrastructure in the face of a total disaster, but depending on the size of an organisation, a "digital will" would also be next logical move.

Wills in the real world are often placed with legal professionals who act as guardians and arbiters of financial estates after the departure of their owners. So a digital will is something that would need to be managed in a similar way by those trusted and competent to do so. Perhaps a combination of traditional legal professionals could bridge a techno gap by collaborating with consultative technologists who could advise, enforce and give guidance on digital correctness.

In the long term, such digital wills must become a part of the the legal system and fully supported by legal entities and services alike. Solicitors and legal professionals will need to step up to this plate if they have not yet done so.

Minimal Viable DR

A minimal viable DR solution to this is I believe, the above, MVK based, simple circle of trust. We continue to keep Git stored code, use S3 bucket stored GPG armoured data, carry out full DR rehearsals and fail-over to the DR. We no longer just upgrade in place and build upgrades into DR. Infrastructure is documented and understood by more than one person together with a simple system of arbitration. When the stuff hits the fan, nobody gets a mess on them.

Stupid is as stupid does, is not an argument to avoid stupid simple solutions that work, when there is no complicated solution yet perfected. Granted, bigger operations at Google scale could not persist with simple workings, but not having a simple plan in place is no excuse for not having one at all.

Image by gabirubah