Surviving the Hacker News Hug of Death on Home Fiber
Cover: after Katsushika Hokusai, The Great Wave off Kanagawa — the wave of traffic cresting over the cluster — the boats hold the line because they were already in position.
May 2026 — on what it takes to make a Hugo blog on home fiber sit still while a thousand strangers ring the doorbell at the same time
This blog runs from a closet. Not metaphorically. There is a three-node Kubernetes cluster about four metres from where I am writing this, hanging off a 1 Gbps Odido fiber connection. No CDN. No Cloudflare. The reason is unromantic: I want to control everything end-to-end, and I want to be able to read the access logs without asking a third party for permission. If the site falls over, that’s my problem. If it survives, that’s also my problem, because then I’ll have to wonder how close I really was to the edge.
Tonight I tried to find out.
The setup
Three nodes, all on the same LAN segment behind a Mikrotik router:
| Node | Arch | RAM | Role |
|---|---|---|---|
| node02 | amd64 | 64 GB | Main compute, 10G NIC to router |
| storage1 | amd64 | 32 GB | NAS / control plane, 1G NIC |
| orange-pi-max-1 | arm64 | 16 GB | ARM edge node, 1G NIC |
Hugo builds in CI, ships as a container, and gets fronted by a Varnish DaemonSet — one cache pod on every node, each with its own warm RAM. nginx-ingress terminates TLS and proxies to the local Varnish via a Service with internalTrafficPolicy: Local, so traffic that lands on a node is served from that node’s cache. The whole thing sits behind a 1 Gbps fiber line in a normal Dutch apartment.
The previous post on this topic (Meten is Weten) explains how that DaemonSet got there. This one is about what happens when you point a thousand virtual users at it.
The provocation
Earlier this week I added a 37-tile painter-cover ribbon to the home page. Every post that has a cover_after frontmatter field gets a tiny tile, sorted by the painter’s birth year — 1450 Bosch on the far left, 1904 Dalí somewhere near the right. It is the visual equivalent of a card catalogue, and I am unreasonably fond of it.
It also means every home-page visit pulls 37 image requests instead of the previous 1.
If you are reading this with one eyebrow raised, that is correct. The honest framing is: I added a slow thing to the home page and then started a fight with it on purpose. I wanted to know whether the cluster could absorb a Hacker News front-page spike with the new ribbon in place, or whether the ribbon would be the thing that turned the spike into a 502 cascade.
Hence k6, hence tonight.
First measurement, and why I should have known better
The first run was a 1000-VU spike test from my laptop over Wi-Fi to the public hostname. The output looked grim.
http_req_duration..............: p(95)=7.04s
http_req_failed................: 3.21%
http_reqs......................: 38214 ~127 req/s
p95 of seven seconds. Three percent errors. My first thought was the ribbon broke everything. My second thought, arriving slightly late, was wait.
The laptop is on Wi-Fi. The laptop is one machine. One thousand concurrent VUs from one machine over Wi-Fi means: one shared NIC, one TLS stack opening a thousand connections in roughly parallel, one DNS resolver doing hairpin lookups for the public IP that then bounces off the router and comes back into the LAN. The bottleneck was sitting on my desk, not in the closet.
What stung a little is the laptop wasn’t supposed to be on Wi-Fi. My normal setup is a USB-C dongle with a 1 Gbit RJ45 going into a closet switch. The cable was missing — the Prusa 3D printer in the same closet had quietly inherited it during a vacuuming incident, and the closet’s too packed to dig into without a project. So the laptop fell back to Wi-Fi without telling me. The first ten minutes of “why is the cluster broken” was actually “why am I on Wi-Fi” with extra steps. Worth writing down: your troubleshooting assumptions about your own setup are a load-bearing part of the test, and they’re the part nobody validates.
So I built a k6 pod and ran the exact same test from inside the cluster, against the in-cluster Service. Same script, same VU count, same scenarios. The numbers changed entirely:
http_req_duration..............: p(95)=354ms
http_req_failed................: 0.00%
http_reqs......................: 137820 ~2298 req/s
p95 dropped from 7s to 354ms. Errors went to zero. Throughput went up roughly 18x. The server had been fine the whole time. The road to the server was the road in front of my apartment.
The lesson I keep relearning, written here so future-me can find it: always know whether you’re measuring the server or the road to the server. A load test from your laptop over Wi-Fi tells you something useful — usually about your laptop and your Wi-Fi.
For everything that follows, “the load test” means the one running from a pod inside the cluster, hitting the in-cluster Service directly. Real-world traffic still has to come down the fiber line, and we’ll get to that ceiling at the end.
The optimisation the data demanded — ribbon thumbs
Even with the test fixed, the ribbon scenario was the worst-behaved one. k6 was extracting all 37 image URLs from the home page and batch-fetching them. The average ribbon batch took 1.1s and pulled about 2.5 MB. For a single home visit that wants to feel instant on mobile, 2.5 MB of decorative tiles is criminal.
The original images were the same WebP files used for the post hero images — roughly 67 KB each. At 48 px on screen, that is wildly more than necessary. The fix is the thing any front-end engineer would have written first: generate small thumbnails at build time and serve those instead.
The Dockerfile already had a step that converted PNG hero images to WebP. I added a second step for the ribbon thumbs.
# Generate small ribbon-thumb variants (144x144 webp, ~3-8 KB each) so the
# home-page covers ribbon doesn't pay the cost of 37 × full-size cover images
# per visit. The ribbon partial reads these via <slug>-thumb.webp; falls back
# to the full image if the thumb is missing.
RUN for f in static/img/*.png; do \
base="${f%.png}"; \
thumb="${base}-thumb.webp"; \
[ -f "$thumb" ] && continue; \
convert "$f" -resize 144x144^ -gravity center -extent 144x144 \
-quality 72 -define webp:method=6 -define webp:lossless=false "$thumb" && \
echo "thumbed: $f → $thumb"; \
done
The ribbon partial picks <slug>-thumb.webp when it exists and falls back to the original otherwise, so old posts keep rendering even if the thumb pipeline misses them.
Per-image size dropped from about 67 KB to about 3.4 KB. The full ribbon payload dropped from roughly 2.5 MB to roughly 125 KB. Twenty times smaller, for an image set you only see at 48 px anyway. This is the part of the post where I have to acknowledge that the most useful optimisation tonight was a convert one-liner, not anything to do with Kubernetes. So be it.
The Varnish cache bump that actually mattered
The DaemonSet was originally configured with malloc,256m per pod — a leftover from when the site was twelve posts long. With 100+ posts, 37 ribbon thumbs, hero images, CSS bundles, fonts, and HTML pages, 256 MB filled within seconds of warmup and Varnish started evicting hot objects to make room for slightly less hot objects. Cache hit rate hovered around 92% under load; it should have been near 100%.
Bumping the malloc size to 2 GB and giving each pod a real CPU limit was the single biggest performance lever of the whole session.
spec:
containers:
- name: varnish
image: varnish:7.6
# was: -s malloc,256m (the original setting, two years stale)
args:
- |
/usr/sbin/varnishd -F -f /etc/varnish/default.vcl -s malloc,2g \
-a 0.0.0.0:80 -p thread_pool_max=1000 -p thread_pools=2 &
resources:
requests:
memory: 2200Mi
cpu: 200m
limits:
memory: 4Gi
cpu: "2"
After that change, cache hit rate sat at >99% for the home page and >97% for the long tail of blog posts. The Hugo backend basically stopped seeing traffic during the load tests — a handful of cold-miss requests per pod and that was it.
This might be a premature optimisation for a personal blog. It is also the kind of thing that takes ten minutes to ship and pays itself off the first time someone links you on a forum.
The HPA that didn’t fire
The original plan was a normal CPU-based HorizontalPodAutoscaler on the burst Deployment that sits behind the DaemonSet. Set target at 70% CPU, scale 2..10 replicas, let Kubernetes do its thing.
Under a 1000-VU spike, the HPA never triggered.
$ kubectl get hpa varnish-burst-djieno
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
varnish-burst-djieno Deployment/varnish-burst-djieno 0%/70% 2 10 2
Cute. Zero percent CPU while serving 2,300 req/s.
The reason is in retrospect obvious and at the time mildly humiliating: Varnish at this scale is RAM- and I/O-bound, not CPU-bound. It serves objects out of a malloc heap. The CPU spends most of its time idle while the kernel shovels bytes out of TCP buffers. Even when bursting hard, no single pod went above ~70% of one core. CPU was simply the wrong metric to autoscale on.
This is a useful failure to keep in mind: the default scaling signal will silently do nothing for any service whose work isn’t measured in CPU cycles. You don’t get an error. You get a flat line and a sense that something should be happening.
KEDA on the right metric
The fix is to scale on the metric that actually moves: HTTP request rate. KEDA makes this straightforward — it ships an external metrics provider that can query any Prometheus-compatible backend and feed the result to a regular HPA.
KEDA went in as a FluxCD HelmRelease, lean install, no HTTP add-on, no Prometheus operator sub-chart (we already run VictoriaMetrics). Each Varnish pod got a sidecar that runs prometheus_varnish_exporter, scraped by a PodMonitor that VictoriaMetrics picks up. Then a single ScaledObject replaced the CPU-HPA:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: varnish-burst-djieno
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: varnish-burst-djieno
pollingInterval: 15
cooldownPeriod: 300
minReplicaCount: 2
maxReplicaCount: 10
triggers:
- type: prometheus
metadata:
serverAddress: http://vmsingle-vm-stack-victoria-metrics-k8s-stack.monitoring.svc.cluster.local:8428
metricName: varnish_req_rate
threshold: '200'
query: sum(rate(varnish_main_client_req[1m]))
The threshold is “200 req/s per replica.” During the spike, total request rate climbed to about 2,300 req/s, the HPA target became 2300/200 ≈ 12 desired replicas, capped at maxReplicaCount: 10, and KEDA scaled the burst Deployment from 2 → 7 in the first wave and 7 → 10 once the spike held. Now the metric actually fired. The HPA actually existed in a non-vestigial sense.
I should have started here. I didn’t, because CPU autoscaling is the default lore. The default lore is fine until it isn’t.
The arch dodge: building varnish-exporter for arm64
The exporter sidecar is jonnenauha/prometheus_varnish_exporter. Excellent project. Ships only linux-amd64 binaries on its GitHub Releases page. orange-pi-max-1 is aarch64, so the released binary won’t run on a third of the cluster.
I could have skipped the exporter on the ARM node, but that would have left the KEDA query missing a third of the request data, and the ARM node is in fact serving traffic. So I built it for both arches from source.
The Dockerfile is short — clone the upstream repo at a specific tag, cross-compile with GOARCH=$TARGETARCH, copy the resulting binary into the varnish:7.6 runtime image so the exporter can call varnishstat.
FROM --platform=$BUILDPLATFORM golang:1.23-alpine AS builder
ARG TARGETARCH
ARG VERSION=1.6.1
WORKDIR /src
RUN apk add --no-cache git ca-certificates && \
git clone --depth 1 --branch ${VERSION} \
https://github.com/jonnenauha/prometheus_varnish_exporter.git . && \
CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} \
go build -trimpath -ldflags="-s -w" -o /out/prometheus_varnish_exporter .
FROM varnish:7.6
COPY --from=builder /out/prometheus_varnish_exporter /usr/local/bin/prometheus_varnish_exporter
USER varnish
ENTRYPOINT ["/usr/local/bin/prometheus_varnish_exporter"]
The build script uses two buildx builders in parallel — the local multiarch builder for amd64, and a remote opi-arm64 builder that runs natively on the ARM node so I don’t pay the QEMU emulation tax. Then docker buildx imagetools create stitches the two single-arch images into a multi-arch manifest under registry.djieno.com/djieno/varnish-exporter:v1.6.1.
If you are in the same arm64 boat with an exporter that only ships amd64 binaries, the recipe is open-source at https://gitlab.com/djieno/varnish-exporter. It is small. Fork it, change the upstream URL, change the runtime base image, ship a multi-arch image. Same shape for any Go exporter on the planet.
The cold-pod dip — warmup before joining the Service
The first time KEDA actually scaled the burst Deployment under load, aggregate throughput dipped. Not catastrophically — about 23% — but visibly. It took me a minute to figure out why.
When KEDA goes 2 → 7, the five new pods boot with empty caches. The moment their readiness probe passes, the Service routes traffic to them. They MISS on the first request for every URL, hit the Hugo backend, fill the cache slowly. During the ~30 seconds it takes each new pod to warm up, aggregate hit rate across the fleet drops, p95 latency climbs, and the spike you scaled out to handle is being served partly by pods that aren’t yet ready to handle it.
The fix is to delay readiness until the cache is actually warm. The container’s command now wraps varnishd: start it in background, fire a handful of HTTP requests at the home page and the critical CSS bundles, touch a marker file, then wait on the varnishd PID.
The slightly cursed detail is that the official varnish:7.6 image doesn’t ship curl or wget, and I didn’t want to install a package just for this. So the warmup uses bash’s built-in /dev/tcp socket support:
set -e
rm -f /var/lib/varnish/.warm
until getent hosts masterdjienocom.default.svc.cluster.local > /dev/null 2>&1; do
echo "waiting for backend DNS..."; sleep 2
done
/usr/sbin/varnishd -F -f /etc/varnish/default.vcl -s malloc,2g \
-a 0.0.0.0:80 -p thread_pool_max=1000 -p thread_pools=2 &
VARNISH_PID=$!
# Wait for varnishd to actually listen
for i in $(seq 1 30); do
if (exec 3<>/dev/tcp/127.0.0.1/80) 2>/dev/null; then
exec 3>&-; break
fi
sleep 1
done
# Warm the critical paths: home, blog index, the four CSS bundles every
# page pulls. Two passes — first one is a MISS, second one verifies HIT.
for path in / /blog/ /css/additional.css /css/medium.css /css/fonts.css /css/syntax.css; do
for n in 1 2; do
(exec 3<>/dev/tcp/127.0.0.1/80
printf 'GET %s HTTP/1.0\r\nHost: djieno.com\r\nConnection: close\r\n\r\n' "$path" >&3
cat <&3 > /dev/null) 2>/dev/null || true
done
done
echo "cache warmed; marking ready"
touch /var/lib/varnish/.warm
wait $VARNISH_PID
The readinessProbe is then trivially simple:
readinessProbe:
exec:
command: [test, -f, /var/lib/varnish/.warm]
initialDelaySeconds: 2
periodSeconds: 2
failureThreshold: 30
Pods join the Service only after the marker file exists. The cold-pod dip is gone — when KEDA scales the Deployment, new pods quietly warm themselves for a few seconds, then start absorbing traffic at full hit rate. It is the sort of fix that feels disproportionate to the problem until the first time you watch a graph stop dipping.
The numbers, honestly
With the ribbon thumbs in place, the 2 GB cache, KEDA scaling on request rate, and the warmup gate, the spike test from inside the cluster looked like this — over five separate runs across the evening:
| Metric | Range across 5 runs |
|---|---|
| Peak sustained throughput | 1,800 – 2,800 req/s |
| HTTP error rate | 0.00% – 0.04% |
| p50 latency (home page) | 15 – 18 ms |
| p95 latency (home page) | 350 – 600 ms |
| p95 latency (full ribbon batch) | 820 – 1,100 ms |
| Cache hit rate (home + CSS) | > 99% |
| Burst replicas during peak | 7 – 10 |
There is real run-to-run variance here. Some of it is k6 startup jitter; some is the KEDA polling interval landing in a different place relative to the spike ramp; some is the cache state at the start of each run. I am not going to round any of this off and claim “the site does 10,000 req/s.” It doesn’t, on this hardware, on this fiber line, and I haven’t measured anything that supports a number that big.
What I can say is that during a 1000-concurrent-VU spike, the site serves ~2,000 requests per second with effectively zero errors and a median response time around 16 milliseconds. That is enough.
The Odido upload ceiling
The remaining honest question is whether the fiber line can deliver any of this to real visitors.
I measured sustained outbound throughput from node02 over a warm TCP connection to a server on the public internet: roughly 758 Mbps out of the nominal 1 Gbps. That is the ISP-side ceiling, not the cluster-side ceiling.
A full home-page visit with the new ribbon weighs about 175 KB on the wire (HTML + CSS + thumbs + WOFF2 fonts, all gzipped). Divide:
758 Mbps / (175 KB × 8 bits/byte) ≈ 535 home visits/sec
So the fiber line can sustain about 535 fresh visitors per second before saturating outbound. The cluster, server-side, runs out of CPU and thread pool capacity somewhere around 70 visitors per second of full asset load (the heaviest scenario, not just the HTML). The cluster is the bottleneck. The fiber line has roughly 7x headroom over the cluster.
A Hacker News front-page hit, depending on time of day and topic, sustains somewhere between 10 and 20 visits per second over a few hours, with brief peaks higher. A Reddit r/devops thread is similar order of magnitude. Both fit comfortably inside what this setup can serve, with the cluster as the limiting resource and the fiber line with room to spare.
That is the closest I am willing to come to a survival claim: this should hold for the kind of spike a personal blog actually gets, with margin. If a much larger property links here, things will get interesting in ways I have not measured.
What this didn’t fix, and what I’d do next
A few honest gaps.
Brief Service-routing window during scale events. When KEDA scales 7 → 10 and then 10 → 7 again, the iptables/IPVS sync inside the cluster lags pod readiness by a fraction of a second. Across the night I saw 0.00% – 0.04% errors during scale-down transitions, almost certainly traffic landing on a pod that just exited the endpoint set. A pre-stop hook with a short sleep would close this further; I haven’t bothered yet because 0.04% is acceptable for tonight.
The 60s rate window in KEDA’s query. rate(varnish_main_client_req[1m]) smooths the request rate over the last minute, which means a sharp spike takes ~30 seconds to be visible to the HPA, and the HPA reacts ~15 seconds after that on top of its polling interval. Total spike-to-scale latency is closer to a minute than I’d like. Shortening the window to [30s] would help, at the cost of more flap risk. Worth experimenting with under a real load pattern, not a synthetic one.
No CDN. A Cloudflare or Bunny in front of this would absorb basically everything I’ve described, for free or close to it, and would also handle the case where my fiber line drops. I keep not putting one there. The reason is the one in the opening paragraph — I want to control everything and serve from my house — and I am aware that this is more a value statement than an architecture decision. Reasonable people would disagree. Cheerfully.
ARM thread pool tuning. The Orange Pi Max can sustain about 600 req/s before its smaller thread pool starts queueing. I haven’t tuned thread_pool_max per-architecture; the same 1000 ceiling is applied across all three nodes. For the spike profile I tested tonight, this didn’t matter, but a more sustained load would benefit from architecture-aware settings.
Closing
The blog now survives a synthetic 1000-VU spike with effectively zero errors, served from a closet in Amsterdam over consumer fiber, behind a cache layer that scales itself on the metric that actually moves, with warmup gating that hides the cold-pod gap, and with thumbnail images that are 20x smaller than they were yesterday. The biggest single win was a one-line convert command. The second biggest was bumping a number from 256 to 2048. The Kubernetes part was mostly there to make those two things matter at the right moment.
Now I just need an excuse to find out if anyone actually wants to read this.
Postscript — what actually happens when you generate the load from outside your own network
After publishing the above, I ran a follow-up test: real external traffic from three fresh Hetzner cpx22 instances in Nuremberg, Falkenstein, and Helsinki. The whole experiment cost €0.008 and ran in about thirty minutes.
The headline:
| Source DC | requests | p50 | p95 | p99 | errors |
|---|---|---|---|---|---|
| Nuremberg | 8,461 | 26 ms | 31 ms | 34 ms | 0 |
| Falkenstein | 8,219 | 34 ms | 41 ms | 48 ms | 0 |
| Helsinki | 7,481 | 67 ms | 75 ms | 88 ms | 0 |
| Aggregate (~345 req/s sustained, real external) | 24,161 | — | < 100 ms anywhere in Europe | — | 0 |
That is the honest “what a visitor in Helsinki experiences” number. Sub-100ms p95 from a different country, zero errors, at five times Hacker-News-peak sustained traffic, while the cluster’s KEDA HPA was scaling and the warmup gating was kicking in. The cover is doing what the cover promised.
But then I pushed harder, because that’s what you do. 1500 VUs split across the same three DCs — 91% timeout rate. The cluster wasn’t to blame: the server-side request-rate metric still climbed to 351 req/s and the HPA scaled the burst Deployment 2 → 6 as designed. The connections never got that far. What broke first wasn’t Varnish, wasn’t Hugo, wasn’t even the Odido upload bandwidth (we measured 758 Mbps sustained, plenty). It was, in order of likelihood:
Mikrotik NAT conntrack.Ruled out. I checked the next morning. The router is set tomax-entries: 1048576— one million conntrack slots, configured years ago when I had a different problem. So nope, the router has room. Initial hypothesis nuked by one ssh-into-the-Mikrotik. Worth keeping the strike-through for the next person who reaches for the same explanation.- nginx-ingress TLS handshake CPU. Three ingress pods doing 1500 simultaneous TLS handshakes saturate before Varnish sees a thing. KEDA could do the same trick on the ingress that it does on Varnish; that would be the next layer of this same story.
- Odido edge shaping. Theoretically possible at the ISP, though our outbound bandwidth tests suggest they’re not the bottleneck on either direction.
The lesson, written down so I can find it next time: the cluster was never the bottleneck I should have been measuring against. Once a real external load is in play, the bottleneck moves to whatever bit of plumbing is between the visitor and the cluster — the home router, the TLS layer, the ISP. In retrospect this is obvious; I just hadn’t drawn the picture far enough to the left.
What got me to the right picture: noticing the cluster-side HPA was happily scaling while the external load was almost entirely failing. That asymmetry only shows up when you measure both sides at the same time. If I had only watched the k6 client, I would have concluded the cluster was broken. If I had only watched Varnish, I would have concluded the test was broken. Both were lying, neither alone.
So: the article above stands. The cluster does what it says on the tin. The router has its own date with KEDA, on a different night.
References
- Meten is Weten: how the DaemonSet got built — the previous chapter
- KEDA — Kubernetes Event-driven Autoscaling, the right tool for non-CPU-bound workloads
- Varnish Cache — still the best HTTP accelerator nobody talks about
- k6 — the load tester I keep reaching for
- jonnenauha/prometheus_varnish_exporter — the exporter we forked for multi-arch
- varnish-exporter (multi-arch fork) — open-source, fork-friendly for anyone else stuck on arm64
- Hokusai, The Great Wave off Kanagawa — the cover-after for this post


