Java Application Performance Monitoring (APM) is the practice of continuously measuring, collecting, and analyzing the behavior of a Java application in production. For Java specifically, APM covers two distinct layers: the application layer (request latency, error rates, throughput, distributed traces) and the JVM layer (heap memory, garbage collection, thread pools, class loading). Both layers need monitoring because JVM problems often manifest as application symptoms. A long GC pause looks like a slow request, and heap exhaustion looks like a timeout.
APM is distinct from general infrastructure monitoring. Infrastructure monitoring tells you a server’s CPU is high. Java APM tells you which method call or GC event caused the CPU spike, which request was slow because of it, and which downstream service that request was waiting on.
Key Takeaways
- Java APM covers two layers: the application layer (latency, errors, throughput, traces) and the JVM layer (heap, GC, threads). Missing either layer creates blind spots
- The Java agent is the primary instrumentation mechanism for APM. It attaches to the JVM at startup via the -javaagent flag and uses bytecode manipulation to inject telemetry without code changes
- The OpenTelemetry Java agent (v2.27.0, May 2026, targeting OTel SDK 1.61.0) is the standard open-source instrumentation option. It supports Java 8+ and auto-instruments hundreds of libraries and frameworks, including Spring Boot, JDBC, gRPC, Kafka, and Hibernate
- Since OTel Java agent v2.0.0, the default export protocol is HTTP/protobuf, not gRPC. This aligns with the OTel specification default
- GC pause time is the most commonly overlooked Java performance signal. A GC pause freezes all application threads simultaneously. A 200ms pause is invisible in most infrastructure dashboards but causes request timeouts in latency-sensitive APIs
- Distributed tracing is the feature that separates modern Java APM from JVM monitoring. Traces follow a single request across services, databases, and message queues, and show exactly where time is spent
What Java APM Monitors
Java APM data falls into three categories, each answering different questions.
Application-Level Signals
What these answer: is the application working correctly and fast enough for users?
| Signal | What it measures | Why it matters |
| Request latency | Time from request received to response sent | The most direct measure of user experience. Alert on p95 and p99, not just the average |
| Throughput | Requests per second | Baseline for capacity planning and anomaly detection |
| Error rate | Percentage of requests returning 5xx or exceptions | A rising error rate on a specific endpoint pinpoints a regression |
| Distributed traces | End-to-end request path across services and databases | Shows exactly where time is spent in a slow request |
| Database query time | Time spent in JDBC, JPA, Hibernate, or R2DBC calls | Database queries are the most common cause of Java service latency spikes |
| External HTTP call duration | Time spent calling downstream services | A slow dependency shows up here before it shows up in your own latency metrics |
JVM-Level Signals
These answer: is the runtime environment healthy?
| Signal | What it measures | Why it matters |
| Heap memory used vs max | Current heap usage as a percentage of the configured maximum | Alert before 80%. At 100%, the JVM throws OutOfMemoryError |
| GC pause time | Duration of stop-the-world GC events | Pauses freeze all threads. Even 100ms pauses cause timeouts in real-time APIs |
| GC frequency | Number of GC cycles per minute | High frequency with low recovery indicates a memory leak or undersized heap |
| Live thread count | Number of currently active JVM threads | Unexpected growth indicates a thread leak. A sudden drop may indicate deadlock |
| Thread pool queue depth | Pending tasks in executor thread pools | A growing queue means threads are not keeping up with incoming work |
| Non-heap memory | Memory used for class metadata, JIT-compiled code, string interning | Can grow unboundedly in some deployment configurations |
Infrastructure Correlation
Java APM becomes most useful when application and JVM signals are correlated with the infrastructure they run on: CPU utilization, network I/O, and disk I/O. A full GC that coincides with a CPU spike is a different problem from a GC that coincides with a pod being throttled.
How Java APM Agents Work
The most practical way to instrument a Java application for APM is the Java agent. It requires no changes to application code and no modifications to build files.
Bytecode manipulation at class load time. When a Java application starts with a -javaagent flag, the agent registers itself with the JVM’s instrumentation API. When the JVM loads a class, the agent intercepts the loading process and modifies the bytecode before the class is used. This modification injects telemetry collection into method calls such as HTTP handlers, database drivers, and messaging clients, without the application developer doing anything.
What this means in practice:
java -javaagent:opentelemetry-javaagent.jar \
-Dotel.service.name=order-service \
-Dotel.exporter.otlp.endpoint=http://otel-collector:4318 \
-jar order-service.jarThis single line, with no code changes, gives you:
- A span for every incoming HTTP request with method, route, status code, and duration
- A span for every outgoing HTTP call with the target host and status code
- A span for every database query with the SQL statement and duration
- A span for every Kafka producer and consumer operation
- JVM metrics: heap usage, GC pause time, thread counts, class loading
- W3C TraceContext propagation on all outgoing HTTP calls
The OpenTelemetry Java Agent
The OTel Java agent (opentelemetry-javaagent.jar) is the official open-source instrumentation agent maintained by the OpenTelemetry project. It is the standard starting point for Java APM in 2026 for teams that are not using a commercial APM vendor.
Current version: v2.27.0 (May 2026), targeting OTel SDK 1.61.0. Requires Java 8 or above.
Download:
curl -L -o opentelemetry-javaagent.jar \
https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jarKey facts about the agent:
- Single JAR file containing the agent and all auto-instrumentation libraries (typically 50-60MB)
- Default export protocol is HTTP/protobuf to http://localhost:4318. Changed from gRPC to HTTP/protobuf in v2.0.0 to align with the OTel specification
- Supports hundreds of libraries and frameworks out of the box
- Configuration is via -D system properties or environment variables. Environment variables take precedence
- Declarative YAML-based configuration is supported from v2.26.0 onward via -Dotel.config.file=/path/to/otel-config.yaml
Supported frameworks and libraries (selection):
| Category | Supported |
| Web frameworks | Spring MVC, Spring WebFlux, Jakarta EE Servlets, Quarkus, Micronaut, Vert.x |
| HTTP clients | Apache HttpClient, OkHttp, java.net.HttpURLConnection, Jetty client |
| Databases | JDBC (all drivers), Hibernate, Spring Data, R2DBC, MongoDB, Redis (Jedis, Lettuce) |
| Messaging | Kafka, RabbitMQ, ActiveMQ, JMS |
| RPC | gRPC, Thrift |
| Caching | Ehcache, Hazelcast |
| Logging | Log4j 2, Logback, java.util.logging (trace ID injection into log records) |
JVM Garbage Collection: The Most Overlooked APM Signal
- GC pauses deserve specific attention because they are the most common source of Java performance problems that are invisible to standard infrastructure monitoring.
- Stop-the-world events. When the JVM runs a major GC cycle, it pauses all application threads simultaneously. During that pause, no requests are processed and no responses are sent. From a user’s perspective, the application freezes. From an infrastructure monitor’s perspective, nothing unusual happened. CPU usage may have been high during the GC, but the server was not down.
- GC pause impact on request latency. A 300ms GC pause will add 300ms to the response time of every request that was in-flight during the pause, even if the request itself only takes 5ms. This shows up as a latency spike in APM traces but is completely invisible in CPU or memory dashboards.
Modern GC collectors and their trade-offs:
| Collector | JVM flag | Best for | Pause behavior |
| G1GC (default since Java 9) | -XX:+UseG1GC | General-purpose workloads | Predictable, configurable pause targets. Pauses in the tens to hundreds of milliseconds range |
| ZGC | -XX:+UseZGC | Latency-sensitive services. Production-ready since Java 15. Generational ZGC (recommended mode) available since Java 21 | Sub-millisecond pauses regardless of heap size. Requires 15-30% more memory than G1GC |
| Shenandoah | -XX:+UseShenandoahGC | Low-latency with large heaps | Sub-millisecond pauses. Available in OpenJDK distributions |
| Parallel GC | -XX:+UseParallelGC | Batch processing, throughput-focused | Longer stop-the-world pauses acceptable in exchange for higher throughput |
For latency-sensitive Java services on Java 21 or above, Generational ZGC (-XX:+UseZGC) is the recommended collector. It delivers consistent sub-millisecond pause times regardless of heap size, which eliminates GC pauses as a source of request latency spikes. The trade-off is 15 to 30% higher memory usage and 8 to 20% additional CPU overhead from concurrent GC threads.
APM vs Logging vs Infrastructure Monitoring
Java teams often have logging (via Logback or Log4j 2) and infrastructure monitoring (via Prometheus node exporter or cloud provider metrics) already in place. APM adds a third layer that neither of the others can replace.
| What you need to know | Logging | Infrastructure monitoring | APM |
| This request took 800ms, where? | No (logs show events, not spans) | No (infra shows aggregate CPU/memory) | Yes (distributed trace shows breakdown) |
| Error rate is rising on /checkout | Possible (if errors are logged with URL) | No | Yes (per-endpoint error rate) |
| Memory is growing, is it a leak? | No | Partially (heap total) | Yes (heap breakdown with GC correlation) |
| GC pause caused this latency spike | No | No | Yes (GC pause timeline overlaid on traces) |
| Which SQL query is slow? | No (unless explicitly logged) | No | Yes (JDBC span with SQL text and duration) |
| Downstream service is slow | No (unless you log it) | No | Yes (outbound HTTP span with target and latency) |
OpenTelemetry vs Commercial Java APM Agents
The OTel Java agent and commercial APM agents (Datadog, Dynatrace, New Relic, AppDynamics) instrument Java applications using the same underlying mechanism: bytecode manipulation at class load time. The instrumentation approach is identical. What differs is where the data goes and what the backend does with it.
| OTel Java agent | Commercial APM agent | |
| Vendor lock-in | None. Data goes to any OTLP-compatible backend | Proprietary format. Data goes to that vendor’s platform |
| Backend cost | Your choice. Open-source (Jaeger, Tempo) or commercial | Included in vendor pricing, often per-host or per-user |
| Library coverage | Hundreds of libraries, community-maintained | Comparable coverage, vendor-maintained |
| Configuration | Environment variables or -D properties | Vendor-specific config files |
| Custom instrumentation | OTel API (stable, vendor-neutral) | Vendor-specific SDK |
| Data portability | Full. Switch backends without re-instrumenting | None. Switching requires re-instrumentation |
The standard recommendation in 2026 for new Java projects is to instrument with the OTel agent and choose a backend separately. This decouples the instrumentation decision from the vendor decision and preserves the ability to switch backends without touching application code.
How Java APM Works in Practice
A complete Java APM setup has four parts working together.
1. Instrumentation: The OTel Java agent attaches at startup and emits OTLP telemetry.
2. Collection: The OTel Collector receives OTLP, applies sampling and filtering, and routes to backends.
3. Storage: Traces go to Jaeger or Tempo, metrics go to Prometheus, logs go to Loki or Elasticsearch.
4. Analysis: Grafana queries all backends via their native query languages, correlating signals from the same request using the shared trace ID.
The OTel trace ID is the linking mechanism. When the Java agent injects a log record during a traced request, it adds the active trace ID to the log entry. When Grafana displays a slow trace span, it can use that trace ID to fetch the logs from that same request. This is the practical value of unified OTel instrumentation: the same context ID ties together the trace, the JVM metric at that moment, and the log line from that request.
Correlating JVM Internals with Request Traces: Where CubeAPM Fits
A GC pause, a thread pool queue backup, or a memory pressure event in JVM metrics tells you something is wrong at the runtime level. It does not tell you which in-flight requests were affected, which endpoints were most impacted, or whether the slowdown was isolated to one service or cascaded across a distributed call chain.
CubeAPM is purpose-built for Java teams and auto-instruments Spring Boot, Hibernate, Tomcat, and Kafka via the OTel Java agent with no additional configuration. It continuously tracks JVM internals, including heap usage, GC pause duration, and thread activity, and correlates them directly with distributed request traces. When an elevated error rate appears, CubeAPM links it to the specific SQL query, GC pause, or downstream call responsible. Its smart sampling preserves slow, error-prone, and unusual traces while cutting ingestion volume by up to 80%, which keeps costs manageable at scale. It runs self-hosted inside your own infrastructure at $0.15/GB ingestion with no per-user fees.
Summary
Java APM monitors two distinct layers simultaneously: application behavior (latency, errors, throughput, distributed traces) and JVM health (heap memory, GC pauses, thread activity). The OpenTelemetry Java agent is the standard open-source instrumentation mechanism, attaching to any Java 8+ application via the -javaagent flag with no code changes.
GC pause time is the most commonly missed signal in Java monitoring. It causes real user-facing latency but is invisible to infrastructure dashboards. Distributed tracing is the signal that ties everything together, showing exactly where time is spent across a request’s journey through services, databases, and message queues.
| Layer | What to monitor | Why it matters |
| Application | Request latency (p95, p99), error rate, throughput | Direct measure of user experience |
| Distributed traces | Span breakdown per request, database query times, external call durations | Pinpoints where time is spent in a slow request |
| JVM: heap | Used vs max, allocation rate | High usage causes GC pressure and eventual OutOfMemoryError |
| JVM: GC | Pause duration, pause frequency, GC throughput | Pauses freeze all threads and spike user-facing latency |
| JVM: threads | Live count, thread pool queue depth | Thread leaks and pool saturation cause request queuing |
| JVM: non-heap | Metaspace, code cache | Can grow unboundedly in some container configurations |
Disclaimer: OTel Java agent version (v2.27.0), supported frameworks, and JVM GC details are verified against the OpenTelemetry Java instrumentation GitHub repository (github.com/open-telemetry/opentelemetry-java-instrumentation/releases), OpenTelemetry official documentation (opentelemetry.io/docs/languages/java, last modified May 20, 2026), Java platform release information (Java 26 current, Java 25 LTS), and CubeAPM Java APM documentation (cubeapm.com/blog/top-apm-tools-for-java) as of May 2026.
Also read:
How to Instrument a FastAPI App with OpenTelemetry
What is the Difference Between OpenTelemetry and Prometheus?





