All Posts

cluster-shepherd: The AI Ops Agent That Actually Knows Your Cluster

April 2026 — what happens when you stop treating AI as a search engine and start treating it as a co-pilot with real cluster access

When Gemini Says Nothing: Two Silent Failure Modes in MCP + LibreChat

April 2026 — field notes from wiring a Kubernetes SRE agent to Gemini 2.5 Flash

I spent the better part of two days debugging an AI agent that would reliably respond with… nothing. No error. No explanation. Just a blank chat bubble where a tool call should have been.

Restoring a Kubernetes app isn't just kubectl apply

February 2026 — backup is easy, restore is where you find out if your backup actually works

Every infrastructure guide talks about backups. Almost none talk honestly about restores.

Ending the commit storm: validating FluxCD manifests locally before they hit the cluster

February 2026 — on the commit history that nobody wants to show their colleagues

Every GitOps practitioner has a section of their git history they’d rather not talk about.

119 commits in one day: what happens when AI meets GitOps without guardrails

January 2026 — a post-mortem on why your branch protections mean absolutely nothing when an AI is at the wheel


It started with a reasonable idea

I run a self-hosted Kubernetes cluster. FluxCD manages the entire thing — GitOps, reconciliation loops, the works. Git is the source of truth. It’s a clean, elegant system. You push to main, the cluster updates. Simple.

One curl command to a GitOps-ready RKE2 cluster

December 2025 — because “fresh cluster” should not take a day

Every time I’ve needed to spin up a new Kubernetes cluster — new hardware, new lab environment, disaster recovery test — I’ve gone through the same ritual. RKE2 install. Wait. Get the kubeconfig. Install ArgoCD. Wait. Bootstrap the application of applications. Configure SSH keys for GitLab access. Wire up the GitOps repo.