Gino Eising
Gino Eising
Nerd by Nature
Jan 15, 2026 13 min read

119 commits in one day: what happens when AI meets GitOps without guardrails

thumbnail for this post

January 2026 — a post-mortem on why your branch protections mean absolutely nothing when an AI is at the wheel


It started with a reasonable idea

I run a self-hosted Kubernetes cluster. FluxCD manages the entire thing — GitOps, reconciliation loops, the works. Git is the source of truth. It’s a clean, elegant system. You push to main, the cluster updates. Simple.

I use AI agents heavily for infrastructure work — Claude Code for day-to-day GitOps, and Gemini via the antigravity session framework for longer-running tasks. They understand Kubernetes manifests, write HelmReleases, fix Kustomize variable substitution errors, and generally make GitOps work feel much faster.

The reasonable idea was: let a Gemini antigravity session help me set up a feature cluster — an isolated Kubernetes environment for testing changes before they hit production. A sandbox. Standard stuff.

The unreasonable outcome was 119 commits pushed to my production GitOps repo in a single day, with FluxCD faithfully applying every single one.


What the evidence shows

Here are the actual commit timestamps from December 28th, 2025:

14:14 feat(infra): enable cnpg backup for authentik-dev cluster
14:15 feat(infra): disable cnpg backup for authentik-dev cluster
14:19 feat(infra): enable cnpg backup for authentik-dev cluster
14:21 feat(infra): disable cnpg backup for authentik-dev cluster
14:25 feat(infra): enable cnpg backup for authentik-dev cluster
14:31 feat(infra): enable cnpg backup for authentik-dev cluster
14:32 feat(infra): disable cnpg backup for authentik-dev cluster
14:39 feat(infra): enable cnpg backup for authentik-dev cluster
14:39 feat(infra): enable cnpg backup for authentik-dev cluster   ← same second
14:42 feat(infra): disable cnpg backup for authentik-dev cluster
14:42 feat(infra): enable cnpg backup for authentik-dev cluster   ← same second
14:42 chore(infra): add velero-test-cluster to backups
14:42 feat(infra): disable cnpg backup for velero-test-cluster    ← same second

That’s two commits at 14:39:xx and three commits at 14:42:xx — the AI was running faster than the clock was ticking.

December 22nd: 98 commits. December 24th: 93 commits. The top 10 days in the repo’s history are all from this December–January period.

This is not a human. Humans don’t commit 119 times in a day to a GitOps repo. They also don’t push the same enable/disable toggle 12 times in 30 minutes, with a CI/CD system applying each change to a live cluster in between.


Why your safety nets don’t help

Here’s the part that should concern you, especially if you’re still thinking about AI-assisted infrastructure in the old mental model.

“We have branch protection rules."

Branch protection stops humans from pushing directly to main. It requires PRs, reviews, approvals. But an AI agent running with commit permissions is already a human from Git’s perspective. It has your SSH key. It has your credentials. It passes the hooks. The branch protection has been granted to the agent — by you — and it will use it exactly as efficiently as it can.

“We have pre-commit hooks."

I have flux-preflight running on every commit. It validates Kubernetes manifests, checks for empty secrets, runs kustomize builds. It caught syntax errors, YAML indentation problems, schema violations. It did not catch the semantic problem: that the same cluster was being toggled on and off repeatedly because the AI was in a feedback loop trying to verify its own changes.

Pre-commit hooks validate structure. They cannot validate intent.

“We separate environments with branches."

This is where it gets interesting. The standard wisdom is: main = production, feature/cluster-test = sandbox. And that works fine when a human is manually switching branches before committing.

An AI agent operating over an extended session, across multiple tool calls, with context that spans thousands of tokens — it loses track of which branch it’s on. Or rather: it never loses track in the human sense. It just reasons about the current state of the repository based on what it can see, and if the feature cluster configuration has leaked into main, it treats that as the ground truth and continues from there.

The feature cluster and the production cluster started sharing manifests. FluxCD, reading from main, applied both. The cluster didn’t “get confused” — it was faithfully doing exactly what it was told.

“We would notice before it gets bad."

Would you? GitOps repos don’t send push notifications. FluxCD reconciles silently. If the changes being committed look valid (they pass linting, they have sensible commit messages, the YAML is well-formed), there’s nothing that jumps out in the ambient noise of a busy cluster.

I noticed when things stopped working. By then the commit history looked like a seismograph.


The actual problem is context, not permissions

This is the part most people haven’t internalized yet.

Traditional Git safety nets were designed for a threat model where the danger is malicious actors or careless humans. A bad actor trying to push to main. A developer who forgot to branch. A junior engineer who didn’t know the convention.

AI agents are neither. They’re not malicious (usually). They’re not careless (they’re very thorough). The problem is that they operate with massive context — they’re tracking dozens of moving parts simultaneously — and in a GitOps environment, that context can drift in ways that are hard to detect from the outside.

When I told Gemini to “set up a feature cluster,” it correctly understood that this meant creating Kubernetes manifests. It correctly understood that FluxCD needs to be told about new clusters. It correctly understood that testing requires iteration. What it didn’t have was a reliable way to enforce the invariant that feature cluster changes should never touch the production branch.

The feedback loop happened because:

  1. The agent committed a change
  2. FluxCD applied it
  3. The cluster state didn’t match expectations
  4. The agent committed a correction
  5. FluxCD applied that too
  6. Go to 3

Each individual commit was locally reasonable. Globally, it was a disaster.


What actually works

After several complete cluster wipeouts (yes, plural), here’s what I’ve learned:

Separate repos, not separate branches. The production GitOps repo and the feature/experiment repo need to be physically separate repositories. Not branches — repos. It’s counterintuitive if you think about it from a human workflow perspective, but it’s the only way to make it structurally impossible for an agent working on the feature repo to commit to production. There’s no branch to accidentally switch back to. There’s no shared history to get confused about.

Explicit cluster identity checks. Every commit pipeline should verify which cluster it’s targeting. Not “which branch am I on?” but “what does this manifest actually point at?” A pre-commit hook that reads the cluster endpoint from the FluxCD configuration and fails loudly if it doesn’t match an allowed target.

Rate limiting commits. This sounds absurd, but: an agent that commits more than N times in M minutes on the same set of files should pause and ask for human confirmation. 119 commits in a day is a signal. 3 commits in 10 seconds is a signal. Humans don’t do this; the rate is itself diagnostic.

Separate FluxCD instances per cluster. The production cluster’s FluxCD reconciler should read only from the production repo. The feature cluster’s FluxCD should read only from the feature repo. Even if someone (or something) pushes feature code to the production repo, the reconciler simply won’t act on it.

Treat AI access like a service account, not a user. Service accounts get minimum necessary permissions, scoped to specific namespaces and resources. An AI agent working on Nextcloud configuration doesn’t need write access to the CNPG backup kustomization. The blast radius of a feedback loop is proportional to the permissions granted.

Give the AI institutional memory before it touches anything. The feedback loop happened partly because the agent had no way to query what had already been tried. It committed, observed failure, and tried something slightly different — with no memory that it had tried essentially the same thing six commits ago. A RAG system that ingests your full GitOps history, past incidents, and chat sessions gives the agent real context before it acts. I built exactly this: cluster-shepherd — an AI ops agent backed by a vector database of every manifest, every incident walkthrough, and every past session in this cluster. The practical difference: before touching anything on a sensitive component, the agent queries “what happened last time someone changed this?” and gets actual answers from the cluster’s own history. That alone would have shortened the December feedback loop significantly — the agent would have seen it was repeating patterns it had already tried.


The uncomfortable conclusion

AI agents are genuinely useful for infrastructure work. They’re faster than humans at writing HelmReleases, they remember Kustomize variable syntax, and they catch typos before you do. I use them daily and I’m not stopping.

But “out of context” is a real and underappreciated failure mode. Not out of context in the LLM sense of “ran out of tokens.” Out of context in the operational sense of “the agent’s model of what it’s doing has drifted from what’s actually happening in the system.”

Traditional safety nets assume you know what bad looks like. 119 commits to a production GitOps repo in a single day is objectively bad. But each individual commit looked fine. The pre-commit hooks passed. The YAML was valid. The commit messages were descriptive. The branch was correct (that was part of the problem).

Your colleagues who are “being cautious” about AI in infrastructure aren’t wrong to be cautious. They’re just being cautious about the wrong things. The danger isn’t that the AI will do something obviously bad. It’s that it will do many individually reasonable things that compose into a disaster.

Guardrails need to be structural, not advisory. And “separate branches for separate environments” is advisory.


The cluster is not a sandbox. It just lives at home.

There’s a misconception I should address before the “why full access” part.

When I say home lab, people imagine a Raspberry Pi in a drawer running a toy nginx server. Something you can pull the plug on with zero consequences. A sandbox.

That’s not what this is.

This cluster runs my email. My Nextcloud with years of documents synced across devices. Immich with tens of thousands of photos. Mailu handling actual deliveries — if it breaks, I miss things. Authentik is the identity provider for everything else, so when Authentik is down, nothing is accessible. There’s a CNPG database cluster managing PostgreSQL for multiple applications, each with real state. SeaweedFS handling ~50 TB of raw storage across spinning disks — 35 TB on the NAS, 5 TB per UDOO node. My family and friends are the users.

The hardware, for context:

RKE2 Kubernetes cluster — BGP + MetalLB, service IPs advertised as real routes:

NodeSpecsRole
node01Ryzen 7 PRO 4750G · 16T · 64 GB · 2 TBStandby compute (identical to node02)
node02Ryzen 7 PRO 4750G · 16T · 64 GB · 2 TBCompute · AMD Radeon GPU (ROCm) · Ollama · LiteLLM
storage1Intel N100 · 4C · 32 GB · 20 TB + 15 TBNAS · SeaweedFS · 35 TB raw
opi5RK3588 ARM64 · 8C · 16 GB
Lexar NM790 2 TB NVMe · PCIe 3.0 x4 · ~3,500 MB/s
Edge · Rockchip NPU · ML inference

Supporting infrastructure:

NodeSpecsRole
orange-pi-6ARM64 · 12C · 32 GB
2 TB NVMe
RAG ingester · nomic-embed-text
udoo-1/2/3x86 · 4C · 4 GB · 5 TB eachSeaweedFS + MinIO HA · keepalived VIP

Network: Mikrotik CCR2004-1G-12S+2XS — 12× SFP+ 10G · 2× SFP28 25G · 10G fiber throughout · 10G internet uplink. MetalLB peers over BGP with the CCR2004 — LoadBalancer IPs are real routes, not NAT.

Amsterdam electricity bills. Not your average toy.

When the cluster is broken, something is in pain. Not metaphorically — there are actual services that actual people (me, my household) depend on. Email stops. Photo sync breaks. Documents become inaccessible. The alarm that should have fired at 3am to tell me the backup failed doesn’t fire because the monitoring stack is also down.

This is not a sandbox. It just happens to be mine.

And that’s the point.

Self-hosting at this level is the fastest way to acquire real operational context that you cannot get from any other source. You learn CNPG recovery procedures not because you read the docs, but because your database is actually down at 11pm on a Tuesday and you need it back. You learn FluxCD reconciliation semantics not from a tutorial, but because FluxCD applied something you didn’t intend to production and you’re reading the source code at 2am trying to understand why.

The hours don’t show up anywhere. When you see a comment like “use postBuild.substitute not substituteFrom for cluster-scoped vars” — that’s not knowledge from reading. That’s a specific Saturday afternoon where everything was broken in a specific way and eventually one thing fixed it. The comment is two seconds to write. The context behind it is five hours you’ll never see.

This is what senior engineers mean when they say they’ve “seen this before.” They don’t mean they read about it. They mean they were personally responsible for a system at 2am when this exact thing broke, and they remember what it cost to fix it.

A true sandbox removes the cost. Removing the cost also removes the learning. The pain is the curriculum.


Why I gave it full access — and why that was the right call

This is the part people don’t say out loud.

I gave the Gemini agent full write access to my GitOps repo. I gave it kubectl. I gave it the ability to commit, push, and watch FluxCD reconcile the results. I gave it more access than I would give a junior engineer on their first week.

And I did it deliberately.

This is my home lab. The data on this cluster is mine — some config files, some self-hosted services, nothing irreplaceable. The worst case is a weekend of rebuilding. I can absorb that cost. Most production environments cannot, which is exactly why I need to understand the failure modes before I’m anywhere near a production environment.

This is how SRE actually works. Not from reading incident reports from other companies. Not from following best practices in a book. From running the experiment yourself, watching the thing break in real time, and then having to reconstruct what happened from the commit history, the pod logs, and the vague memory of what you were trying to do when it all went sideways.

The 119-commit storm was expensive in terms of my Saturday. It was invaluable in terms of what I now understand about AI agents in GitOps environments. I know exactly why it happened. I know what the feedback loop looked like from the inside. I know what FluxCD’s reconciliation behavior is when you push conflicting state at 1-second intervals. I know what the cluster looks like when two sets of Kustomize overlays are fighting over the same resource.

I could not have learned any of this from theory.

The traditional caution isn’t wrong — it’s just aimed at the wrong target. People who refuse to give AI agents meaningful access will never encounter these failure modes. They’ll also never develop the intuition to design systems that prevent them. They’ll eventually be the people who give AI agents meaningful access in a context where the cost of failure is much higher, without ever having seen what failure looks like.

Getting burned in a home lab is tuition. It’s cheap tuition compared to the alternative.

The philosophy is simple: you only learn by getting back on the bike. The fall is not the failure. The failure is not getting back on, or never riding in the first place.

Every cluster wipeout I’ve done — and there have been several — has taught me something that I carry forward. Separate repos for separate clusters. Rate-limit AI commits. Treat AI access like a scoped service account. These aren’t guidelines I read somewhere. They’re lessons I extracted from wreckage, at a time and place where the wreckage was acceptable.

Build your own wreckage. Do it at home, on hardware you own, with data you can afford to lose. Document what you find. Share it. That’s the job.


The cluster recovered. Most of it, anyway. The December 28th commit history remains, a permanent artifact of the day GitOps met an AI feedback loop at full speed. If you want to see what 119 commits to a GitOps repo in one day looks like, git log --after="2025-12-27" --before="2025-12-29" on any repo where you’ve given an AI write access.

Merry Christmas.