All Posts
700ms to 2ms: What a Cluster Fire Taught Me About Embedding
700ms. That was the number that haunted my Kubernetes cluster, slowly burning it to the ground. Every alert the cluster generated, every log line it processed for AI-driven feedback, triggered an embedding operation. Each of these embeddings, we thought, took 700ms, saturating a CPU core, which in turn triggered more alerts, creating a truly spectacular, self-immolating feedback loop. The load average climbed to 153. It was a cluster fire, quite literally. Then, in the chaotic aftermath of patching the inferno and moving that “expensive” embedding workload to a humble 15-watt ARM board, something remarkable emerged: the warm latency was a mere 2ms. Even a cold start, including model loading, clocked in at around 100ms. The sobering discovery? The 700ms was never about the embedding operation itself. It was the embedding struggling under full CPU saturation, choked for headroom. On a quiet, dedicated machine with a warm model, the exact same task takes 2 milliseconds.
The AI That Monitored Your Cluster Just Brought It Down
April 2026 — on the sentinel that decided to burn the house down
“Why can’t I see the new photos?”
That’s how the outage started. Not with a PagerDuty alert or a Grafana dashboard turning red, but with a casual question from my wife. I was already deep in the weeds debugging a glitch in Nextcloud Talk, but as I tried to refresh my own dashboard, the latency didn’t just spike—it vanished. Immich was gone. Mail was gone. The search index was a black hole.
cluster-shepherd: The AI Ops Agent That Actually Knows Your Cluster
April 2026 — what happens when you stop treating AI as a search engine and start treating it as a co-pilot with real cluster access
I built an AI to stop the wrong recruiters from wasting my time
April 2026 — on replacing an inbox full of irrelevant opportunities with a system that actually thinks
If you’ve worked in IT for more than a few years in Europe, you know the pattern. A recruiter reaches out. The message contains your name (sometimes), a job description (loosely relevant), and an offer (usually well below your rate). They’re matching on keywords. “Kubernetes” in your profile, “Kubernetes” in the job description — match. The fact that the role is junior, six timezones away, pays 40% less than your current work, and requires a technology you haven’t touched in three years is irrelevant. The keyword matched.