Zero to observability in a Java app: OpenTelemetry agent, Prometheus, and Grafana Tempo
September 2025 — adding real observability to a Java service without touching a line of application code
Logs tell you something happened. Metrics tell you how often. Traces tell you exactly what happened, in what order, for how long, across which services.
Most Java applications in the wild have the first. Some have the second. Almost none have all three wired together properly. This post is about setting up the full stack from scratch — OTel Java agent, Prometheus, Grafana Tempo — using a reference Spring Boot application as the subject.
The interesting part: you don’t have to modify the application code at all.
What we’re instrumenting
The demo application is a Spring Boot REST API — a deliberately simple “task manager” (subtitled “USELESS task manager” in the UI, in the spirit of honest naming). It has a load generator that hits the endpoints continuously, and a PostgreSQL backend.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ Load Generator │────▶│ Spring Boot │◀────│ PostgreSQL │
│ │ │ Application │ │ │
└────────┬────────┘ └────────┬────────┘ └─────────────────┘
│ │
└──────────────────────▶│
↓
┌─────────────────────┐
│ OpenTelemetry │
│ Collector │
└───┬───────┬─────────┘
│ │
↓ ↓
Prometheus Grafana Tempo
│ │
└──────┬───────┘
↓
Grafana
The OTel Collector is the central hub: it receives spans and metrics from the instrumented application, processes them, and exports to the right backends.
The agent: zero code change instrumentation
The OpenTelemetry Java agent is a JAR that attaches to the JVM at startup via -javaagent. It instruments popular frameworks (Spring Boot, JDBC, HTTP clients) automatically, emitting spans for every incoming request, outgoing database query, and HTTP call.
FROM eclipse-temurin:17-jre-alpine
COPY app.jar /app/app.jar
COPY opentelemetry-javaagent.jar /app/otel-agent.jar
ENV JAVA_OPTS="-javaagent:/app/otel-agent.jar"
ENV OTEL_EXPORTER_OTLP_ENDPOINT="http://otel-collector:4317"
ENV OTEL_SERVICE_NAME="task-manager"
ENV OTEL_METRICS_EXPORTER="otlp"
ENV OTEL_TRACES_EXPORTER="otlp"
CMD java $JAVA_OPTS -jar /app/app.jar
Three environment variables and a JAR on the classpath. No changes to pom.xml, no Spring dependencies, no @Trace annotations. The agent does the rest.
What you get automatically:
- A span for every HTTP request (
GET /api/tasks,POST /api/tasks/{id}) - Child spans for every JDBC query (
SELECT * FROM tasks WHERE id=?) - HTTP client spans if the app makes outbound calls
- JVM metrics: heap usage, GC pause time, thread count
- Spring actuator metrics if the endpoint is exposed
The Collector configuration
The OTel Collector is what separates concerns. The application sends everything to one place (the Collector). The Collector decides what goes where.
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
spanmetrics:
metrics_exporter: prometheus
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, spanmetrics]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
The spanmetrics processor is the interesting piece. It derives Prometheus metrics directly from span data: traces_spanmetrics_latency_bucket, traces_spanmetrics_calls_total. This gives you latency histograms (p50, p95, p99) for every endpoint without any additional instrumentation.
In Grafana, that means you can build a latency heatmap over all your endpoints and alert on it, without writing a single custom metric.
Tempo instead of Jaeger
I started with Jaeger. It works fine. But Grafana Tempo integrates more cleanly with Grafana Cloud and the self-hosted Grafana — no separate Jaeger UI to manage, traces show up inline with the metrics dashboard, and TraceQL (Tempo’s query language) is more expressive for finding specific traces.
The switch is a one-line config change in the Collector: otlp/jaeger becomes otlp/tempo with the new endpoint. Tempo uses the same OTLP protocol; nothing about the application or agent changes.
# Tempo minimal config
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
storage:
trace:
backend: local
local:
path: /tmp/tempo/blocks
For production, swap local for S3/GCS/Azure. For a dev environment or learning setup, local storage is fine.
The Grafana dashboard setup
With metrics in Prometheus and traces in Tempo, the Grafana setup is:
- Add Prometheus datasource:
http://prometheus:9090 - Add Tempo datasource:
http://tempo:3200 - In the Tempo datasource config, link to Prometheus for span metrics: enables the “Go to metrics” button from a trace view
Once linked, the workflow is:
- See a latency spike in the Prometheus metric → click the spike → Grafana finds traces from that time window → click a trace → see the exact DB query that was slow
That chain — metric anomaly → trace → root cause — is what proper observability means. Logs can tell you the query was slow after the fact. Traces show you which specific request was slow, what called it, and what it called downstream.
What I found during the demo
Running the load generator against the instrumented app produced an immediately visible pattern in the span metrics: POST /api/tasks was consistently at p99 > 200ms while GET /api/tasks was under 20ms. Obvious in hindsight — writes go through an ORM, reads are a simple SELECT — but without the trace data you’d be guessing.
The trace view showed the write path:
- Spring MVC dispatch: 2ms
- Hibernate session open: 1ms
- JDBC
INSERT INTO tasks: 190ms - Hibernate session close: 1ms
The 190ms is the actual database write. The rest is framework overhead under 5ms. If that were a production performance problem, you’d know exactly where to look: the write path, and specifically the DB insert. Not the ORM. Not the HTTP layer. Not something to rewrite in a faster language.
Reproducing this yourself
The full stack is in docker-compose.yml in the repository. One command:
docker compose up -d
Services:
app— Spring Boot task manager (port 8080)load-generator— hits the app every 100msotel-collector— receives, processes, exports (ports 4317/4318/8889)prometheus— scrapes Collector (port 9090)tempo— receives traces (port 3200)grafana— dashboards (port 3000)
Grafana at http://localhost:3000, default credentials admin/admin. The latency dashboard and trace explorer are pre-provisioned.
The honest caveat
Collector configuration has a learning curve. The YAML is verbose and the error messages when you misconfigure a pipeline are not always clear. Start with the minimal config above and add complexity incrementally.
The agent auto-instrumentation covers the most common cases but not everything. If you have custom business logic that spans multiple services — a pricing calculation that calls three microservices, for example — you’ll need to add manual spans for those. The OTel Java SDK makes this straightforward, but it does require code changes.
For a single service, the agent is sufficient. For a distributed system, plan for a mix of auto-instrumentation and manual spans at the boundaries that matter most.

