$ cat the-cloud-audit-we-run.md

The cloud audit we run on every new engagement.

· 6 min read · consulting

Every new engagement starts the same way: one week, five dimensions, one page of findings. We've published the rubric so any team can self-run it before deciding whether they need outside help.

The first week with a new client is mostly listening. Show us the AWS console. Show us your repos. Show us the on-call rotation. Show us last quarter's incidents. By Friday we deliver one page — what's red, what's amber, what's green, and what we'd touch first if it were our system.

The rubric below is what we score against. None of it is novel. The point is having a single artifact you can revisit every six months and watch greens accumulate.

The five dimensions

1. Security posture
2. Cost & waste
3. IaC coverage
4. Observability & on-call
5. Resilience & recovery

For each, we score Red / Amber / Green against a small number of binary checks. Anything Red goes on page one of the report. Anything Amber gets a one-line note. Greens get a tick and we move on.

1. Security posture

Green if all of:

Red on any of: long-lived IAM access keys belonging to humans, public S3 buckets that nobody can explain, root account in active use, secrets in .env files in the repo.

2. Cost & waste

Green if all of:

Red on: nobody owns the bill, top line is "EC2 — Other", non-prod runs 24/7 at full size, no tagging strategy at all.

3. IaC coverage

Green if all of:

Red on: a senior engineer is the only person who can deploy because half the resources are click-ops, state is in a single S3 bucket with no locking, you cannot answer "could we rebuild this from scratch" with yes.

4. Observability & on-call

Green if all of:

Red on: nobody is paged out of hours, alerts go to an unread Slack channel, "monitoring" is reading kubectl logs when something breaks.

5. Resilience & recovery

Green if all of:

Red on: backups exist but have never been restored, no postmortems, single-AZ production, "DR plan" lives in someone's head.

The deliverable

One page. Five rows. Each row: Red / Amber / Green, two-line finding, one-line first action. Then a prioritised top-five list of what to fix in the next 90 days, with effort estimates.

That's the entire artifact. Not 40 pages. Not a slide deck. One page that the CTO can forward to the CEO and the board.

Why a single page. A 40-page report is a defence document — it justifies the consultant's fee and gets filed. A one-page report is a decision document — it gets argued about, prioritised, and acted on. We've never had a client wish ours was longer.

The 90-day plan

Whatever the audit surfaces, the follow-up plan never has more than five items. Brain capacity is finite. The two universal rules:

  1. Fix one Red before chasing two Ambers. Reds are the things that turn into incidents.
  2. Cost wins fund security work. If audit finds €4k/month of waste, the savings pay for the next two months of remediation work.

Self-running it

You can run this audit yourself today. It takes about a day with two senior engineers and access to the cloud consoles. The hardest part is being honest about the Reds — if you wrote the system being audited, you'll instinctively defend it. Ask someone outside the team to score it, even if they don't know your stack. The questions are mostly binary.

If you do, send us the result. We'll tell you which finding we'd prioritise — for free, no sales call attached.


If running it yourself isn't realistic, we do this audit fixed-scope, fixed-fee in 1 week. Drop us a note.