What Is Java Application Performance Monitoring (APM)?

Java Application Performance Monitoring (APM) is the practice of continuously measuring, collecting, and analyzing the behavior of a Java application in production. For Java specifically, APM covers two distinct layers: the application layer (request latency, error rates, throughput, distributed traces) and the JVM layer (heap memory, garbage collection, thread pools, class loading). Both layers need monitoring because JVM problems often manifest as application symptoms. A long GC pause looks like a slow request, and heap exhaustion looks like a timeout.

APM is distinct from general infrastructure monitoring. Infrastructure monitoring tells you a server’s CPU is high. Java APM tells you which method call or GC event caused the CPU spike, which request was slow because of it, and which downstream service that request was waiting on.

Key Takeaways

Java APM covers two layers: the application layer (latency, errors, throughput, traces) and the JVM layer (heap, GC, threads). Missing either layer creates blind spots
The Java agent is the primary instrumentation mechanism for APM. It attaches to the JVM at startup via the -javaagent flag and uses bytecode manipulation to inject telemetry without code changes
The OpenTelemetry Java agent (v2.27.0, May 2026, targeting OTel SDK 1.61.0) is the standard open-source instrumentation option. It supports Java 8+ and auto-instruments hundreds of libraries and frameworks, including Spring Boot, JDBC, gRPC, Kafka, and Hibernate
Since OTel Java agent v2.0.0, the default export protocol is HTTP/protobuf, not gRPC. This aligns with the OTel specification default
GC pause time is the most commonly overlooked Java performance signal. A GC pause freezes all application threads simultaneously. A 200ms pause is invisible in most infrastructure dashboards but causes request timeouts in latency-sensitive APIs
Distributed tracing is the feature that separates modern Java APM from JVM monitoring. Traces follow a single request across services, databases, and message queues, and show exactly where time is spent

What Java APM Monitors

Java APM data falls into three categories, each answering different questions.

Application-Level Signals

What these answer: is the application working correctly and fast enough for users?

Signal	What it measures	Why it matters
Request latency	Time from request received to response sent	The most direct measure of user experience. Alert on p95 and p99, not just the average
Throughput	Requests per second	Baseline for capacity planning and anomaly detection
Error rate	Percentage of requests returning 5xx or exceptions	A rising error rate on a specific endpoint pinpoints a regression
Distributed traces	End-to-end request path across services and databases	Shows exactly where time is spent in a slow request
Database query time	Time spent in JDBC, JPA, Hibernate, or R2DBC calls	Database queries are the most common cause of Java service latency spikes
External HTTP call duration	Time spent calling downstream services	A slow dependency shows up here before it shows up in your own latency metrics

JVM-Level Signals

These answer: is the runtime environment healthy?

Signal	What it measures	Why it matters
Heap memory used vs max	Current heap usage as a percentage of the configured maximum	Alert before 80%. At 100%, the JVM throws OutOfMemoryError
GC pause time	Duration of stop-the-world GC events	Pauses freeze all threads. Even 100ms pauses cause timeouts in real-time APIs
GC frequency	Number of GC cycles per minute	High frequency with low recovery indicates a memory leak or undersized heap
Live thread count	Number of currently active JVM threads	Unexpected growth indicates a thread leak. A sudden drop may indicate deadlock
Thread pool queue depth	Pending tasks in executor thread pools	A growing queue means threads are not keeping up with incoming work
Non-heap memory	Memory used for class metadata, JIT-compiled code, string interning	Can grow unboundedly in some deployment configurations

Infrastructure Correlation

Java APM becomes most useful when application and JVM signals are correlated with the infrastructure they run on: CPU utilization, network I/O, and disk I/O. A full GC that coincides with a CPU spike is a different problem from a GC that coincides with a pod being throttled.

How Java APM Agents Work

The most practical way to instrument a Java application for APM is the Java agent. It requires no changes to application code and no modifications to build files.

Bytecode manipulation at class load time. When a Java application starts with a -javaagent flag, the agent registers itself with the JVM’s instrumentation API. When the JVM loads a class, the agent intercepts the loading process and modifies the bytecode before the class is used. This modification injects telemetry collection into method calls such as HTTP handlers, database drivers, and messaging clients, without the application developer doing anything.

What this means in practice:

java -javaagent:opentelemetry-javaagent.jar \

  -Dotel.service.name=order-service \

  -Dotel.exporter.otlp.endpoint=http://otel-collector:4318 \

  -jar order-service.jar

java -javaagent:opentelemetry-javaagent.jar \

  -Dotel.service.name=order-service \

  -Dotel.exporter.otlp.endpoint=http://otel-collector:4318 \

  -jar order-service.jar

This single line, with no code changes, gives you:

A span for every incoming HTTP request with method, route, status code, and duration
A span for every outgoing HTTP call with the target host and status code
A span for every database query with the SQL statement and duration
A span for every Kafka producer and consumer operation
JVM metrics: heap usage, GC pause time, thread counts, class loading
W3C TraceContext propagation on all outgoing HTTP calls

The OpenTelemetry Java Agent

The OTel Java agent (opentelemetry-javaagent.jar) is the official open-source instrumentation agent maintained by the OpenTelemetry project. It is the standard starting point for Java APM in 2026 for teams that are not using a commercial APM vendor.

Current version: v2.27.0 (May 2026), targeting OTel SDK 1.61.0. Requires Java 8 or above.

Download:

curl -L -o opentelemetry-javaagent.jar \

https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar

curl -L -o opentelemetry-javaagent.jar \

https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar

Key facts about the agent:

Single JAR file containing the agent and all auto-instrumentation libraries (typically 50-60MB)
Default export protocol is HTTP/protobuf to http://localhost:4318. Changed from gRPC to HTTP/protobuf in v2.0.0 to align with the OTel specification
Supports hundreds of libraries and frameworks out of the box
Configuration is via -D system properties or environment variables. Environment variables take precedence
Declarative YAML-based configuration is supported from v2.26.0 onward via -Dotel.config.file=/path/to/otel-config.yaml

Supported frameworks and libraries (selection):

Category	Supported
Web frameworks	Spring MVC, Spring WebFlux, Jakarta EE Servlets, Quarkus, Micronaut, Vert.x
HTTP clients	Apache HttpClient, OkHttp, java.net.HttpURLConnection, Jetty client
Databases	JDBC (all drivers), Hibernate, Spring Data, R2DBC, MongoDB, Redis (Jedis, Lettuce)
Messaging	Kafka, RabbitMQ, ActiveMQ, JMS
RPC	gRPC, Thrift
Caching	Ehcache, Hazelcast
Logging	Log4j 2, Logback, java.util.logging (trace ID injection into log records)

JVM Garbage Collection: The Most Overlooked APM Signal

GC pauses deserve specific attention because they are the most common source of Java performance problems that are invisible to standard infrastructure monitoring.
Stop-the-world events. When the JVM runs a major GC cycle, it pauses all application threads simultaneously. During that pause, no requests are processed and no responses are sent. From a user’s perspective, the application freezes. From an infrastructure monitor’s perspective, nothing unusual happened. CPU usage may have been high during the GC, but the server was not down.
GC pause impact on request latency. A 300ms GC pause will add 300ms to the response time of every request that was in-flight during the pause, even if the request itself only takes 5ms. This shows up as a latency spike in APM traces but is completely invisible in CPU or memory dashboards.

Modern GC collectors and their trade-offs:

Collector	JVM flag	Best for	Pause behavior
G1GC (default since Java 9)	-XX:+UseG1GC	General-purpose workloads	Predictable, configurable pause targets. Pauses in the tens to hundreds of milliseconds range
ZGC	-XX:+UseZGC	Latency-sensitive services. Production-ready since Java 15. Generational ZGC (recommended mode) available since Java 21	Sub-millisecond pauses regardless of heap size. Requires 15-30% more memory than G1GC
Shenandoah	-XX:+UseShenandoahGC	Low-latency with large heaps	Sub-millisecond pauses. Available in OpenJDK distributions
Parallel GC	-XX:+UseParallelGC	Batch processing, throughput-focused	Longer stop-the-world pauses acceptable in exchange for higher throughput

For latency-sensitive Java services on Java 21 or above, Generational ZGC (-XX:+UseZGC) is the recommended collector. It delivers consistent sub-millisecond pause times regardless of heap size, which eliminates GC pauses as a source of request latency spikes. The trade-off is 15 to 30% higher memory usage and 8 to 20% additional CPU overhead from concurrent GC threads.

APM vs Logging vs Infrastructure Monitoring

Java teams often have logging (via Logback or Log4j 2) and infrastructure monitoring (via Prometheus node exporter or cloud provider metrics) already in place. APM adds a third layer that neither of the others can replace.

What you need to know	Logging	Infrastructure monitoring	APM
This request took 800ms, where?	No (logs show events, not spans)	No (infra shows aggregate CPU/memory)	Yes (distributed trace shows breakdown)
Error rate is rising on /checkout	Possible (if errors are logged with URL)	No	Yes (per-endpoint error rate)
Memory is growing, is it a leak?	No	Partially (heap total)	Yes (heap breakdown with GC correlation)
GC pause caused this latency spike	No	No	Yes (GC pause timeline overlaid on traces)
Which SQL query is slow?	No (unless explicitly logged)	No	Yes (JDBC span with SQL text and duration)
Downstream service is slow	No (unless you log it)	No	Yes (outbound HTTP span with target and latency)

OpenTelemetry vs Commercial Java APM Agents

The OTel Java agent and commercial APM agents (Datadog, Dynatrace, New Relic, AppDynamics) instrument Java applications using the same underlying mechanism: bytecode manipulation at class load time. The instrumentation approach is identical. What differs is where the data goes and what the backend does with it.

	OTel Java agent	Commercial APM agent
Vendor lock-in	None. Data goes to any OTLP-compatible backend	Proprietary format. Data goes to that vendor’s platform
Backend cost	Your choice. Open-source (Jaeger, Tempo) or commercial	Included in vendor pricing, often per-host or per-user
Library coverage	Hundreds of libraries, community-maintained	Comparable coverage, vendor-maintained
Configuration	Environment variables or -D properties	Vendor-specific config files
Custom instrumentation	OTel API (stable, vendor-neutral)	Vendor-specific SDK
Data portability	Full. Switch backends without re-instrumenting	None. Switching requires re-instrumentation

The standard recommendation in 2026 for new Java projects is to instrument with the OTel agent and choose a backend separately. This decouples the instrumentation decision from the vendor decision and preserves the ability to switch backends without touching application code.

How Java APM Works in Practice

A complete Java APM setup has four parts working together.

1. Instrumentation: The OTel Java agent attaches at startup and emits OTLP telemetry.

2. Collection: The OTel Collector receives OTLP, applies sampling and filtering, and routes to backends.

3. Storage: Traces go to Jaeger or Tempo, metrics go to Prometheus, logs go to Loki or Elasticsearch.

4. Analysis: Grafana queries all backends via their native query languages, correlating signals from the same request using the shared trace ID.

The OTel trace ID is the linking mechanism. When the Java agent injects a log record during a traced request, it adds the active trace ID to the log entry. When Grafana displays a slow trace span, it can use that trace ID to fetch the logs from that same request. This is the practical value of unified OTel instrumentation: the same context ID ties together the trace, the JVM metric at that moment, and the log line from that request.

Correlating JVM Internals with Request Traces: Where CubeAPM Fits

A GC pause, a thread pool queue backup, or a memory pressure event in JVM metrics tells you something is wrong at the runtime level. It does not tell you which in-flight requests were affected, which endpoints were most impacted, or whether the slowdown was isolated to one service or cascaded across a distributed call chain.

CubeAPM is purpose-built for Java teams and auto-instruments Spring Boot, Hibernate, Tomcat, and Kafka via the OTel Java agent with no additional configuration. It continuously tracks JVM internals, including heap usage, GC pause duration, and thread activity, and correlates them directly with distributed request traces. When an elevated error rate appears, CubeAPM links it to the specific SQL query, GC pause, or downstream call responsible. Its smart sampling preserves slow, error-prone, and unusual traces while cutting ingestion volume by up to 80%, which keeps costs manageable at scale. It runs self-hosted inside your own infrastructure at $0.15/GB ingestion with no per-user fees.

Summary

Java APM monitors two distinct layers simultaneously: application behavior (latency, errors, throughput, distributed traces) and JVM health (heap memory, GC pauses, thread activity). The OpenTelemetry Java agent is the standard open-source instrumentation mechanism, attaching to any Java 8+ application via the -javaagent flag with no code changes.

GC pause time is the most commonly missed signal in Java monitoring. It causes real user-facing latency but is invisible to infrastructure dashboards. Distributed tracing is the signal that ties everything together, showing exactly where time is spent across a request’s journey through services, databases, and message queues.

Layer	What to monitor	Why it matters
Application	Request latency (p95, p99), error rate, throughput	Direct measure of user experience
Distributed traces	Span breakdown per request, database query times, external call durations	Pinpoints where time is spent in a slow request
JVM: heap	Used vs max, allocation rate	High usage causes GC pressure and eventual OutOfMemoryError
JVM: GC	Pause duration, pause frequency, GC throughput	Pauses freeze all threads and spike user-facing latency
JVM: threads	Live count, thread pool queue depth	Thread leaks and pool saturation cause request queuing
JVM: non-heap	Metaspace, code cache	Can grow unboundedly in some container configurations

Disclaimer: OTel Java agent version (v2.27.0), supported frameworks, and JVM GC details are verified against the OpenTelemetry Java instrumentation GitHub repository (github.com/open-telemetry/opentelemetry-java-instrumentation/releases), OpenTelemetry official documentation (opentelemetry.io/docs/languages/java, last modified May 20, 2026), Java platform release information (Java 26 current, Java 25 LTS), and CubeAPM Java APM documentation (cubeapm.com/blog/top-apm-tools-for-java) as of May 2026.

Also read:

How to Instrument a FastAPI App with OpenTelemetry

What is the Difference Between OpenTelemetry and Prometheus?

What Is OpenTelemetry and How Does It Work?