Elasticsearch monitoring helps teams track cluster health, JVM performance, query latency, and shard allocation to keep search workloads reliable and fast. Around 65% of IT leaders plan to increase observability spending this year, yet many still struggle with alert fatigue, high-cardinality metrics, and costly, fragmented tools that miss critical Elasticsearch issues.
CubeAPM is the best solution for monitoring Elasticsearch. It unifies metrics, logs, and error tracing in one OpenTelemetry-native platform, automatically detects clusters, and correlates traces with latency spikes to give teams full visibility. With smart sampling, real-time dashboards, and efficient data handling, CubeAPM delivers deep Elasticsearch insights at a fraction of the cost of legacy tools.
In this article, we’ll cover what Elasticsearch monitoring is, why it matters, key metrics to track, and how CubeAPM simplifies monitoring through unified observability and OTel-powered automation.
Table of Contents
ToggleWhat Do You Mean by Elasticsearch Monitoring?

Elasticsearch is a distributed search and analytics engine designed to handle massive volumes of structured and unstructured data in real time. It powers everything from application search bars and log analytics to observability platforms and recommendation systems.
Elasticsearch monitoring refers to the continuous tracking of its internal components — clusters, nodes, indices, queries, and JVM processes, to ensure stability, performance, and availability across your infrastructure.
In modern environments, where Elasticsearch forms the backbone of log and metrics pipelines, effective monitoring helps businesses detect bottlenecks before they impact end users. It offers visibility into how efficiently data is indexed, queried, and replicated across nodes, ensuring both high availability and optimal query response times. The key advantages include:
- Proactive performance management: Detect slow queries, unbalanced shards, or heap memory issues before they affect uptime.
- Resource optimization: Right-size clusters by analyzing CPU, JVM, and disk usage trends.
- Faster troubleshooting: Correlate logs, metrics, and traces to identify root causes quickly.
- Business reliability: Keep customer-facing search or analytics workloads fast and consistent, even during heavy data ingestion or scaling events.
Example: Monitoring Search Latency in an eCommerce Platform
Imagine an e-commerce company using Elasticsearch to power product search. During a holiday sale, the search response time suddenly spikes from 100 ms to 1.5 s. With Elasticsearch monitoring in place, engineers can quickly trace the issue — high JVM heap usage and overloaded query threads caused by new indexing jobs.
By analyzing node metrics and trace data in CubeAPM, they detect the imbalance, reallocate shards, and restore latency under 200 ms — preventing revenue loss and poor user experience.
Why Monitoring Elasticsearch is Critical
JVM Heap Pressure & GC Pauses
Elasticsearch runs on the JVM, so heap saturation directly affects cluster stability and search latency. High old-generation memory or long garbage collection (GC) pauses can freeze queries or trigger full-cluster slowdowns. Monitoring heap usage, GC duration, and circuit breaker trips helps catch memory leaks before OOM errors. Elastic recommends maintaining heap utilization below 75–80% for consistent performance. Without active monitoring, even minor GC inefficiencies can escalate into node crashes.
Shard Allocation & Cluster State
Elasticsearch splits data into shards, distributing them across nodes for scalability and fault tolerance. But unassigned, relocating, or oversized shards can push the cluster into a yellow or red state, delaying queries and index writes. Keeping track of unassigned shards, pending tasks, and allocation failures prevents these issues. Elastic’s 2025 reliability guidelines emphasize continuous shard rebalancing to avoid red-state clusters during upgrades or ingestion spikes.
Thread Pool Saturation & Rejections
Each Elasticsearch node maintains dedicated thread pools for search, bulk indexing, and write tasks. When queues overflow, requests start getting rejected, leading to timeouts and user-facing failures. Monitoring queue depth, active threads, and rejection counts helps pinpoint capacity saturation before it cascades.
Cache & Fielddata Evictions
Elasticsearch relies on query caches and fielddata caches to improve response times. When these caches fill up or evict frequently, searches and aggregations slow significantly. Tracking cache hit/miss ratios, eviction rates, and fielddata memory ensures consistent query performance. Elastic engineers note that misusing fielddata on text fields can quickly consume heap and trigger circuit breakers—one of the most common heap-related failures in production clusters.
Disk I/O & Fsync Latency
Elasticsearch constantly writes index segments, merges, and flushes data to disk. Slow disks, high I/O wait, or fsync delays directly stall indexing and query response. Monitoring I/O throughput, latency, and filesystem wait times across nodes helps detect failing SSDs or overloaded disks early. In 2024, Statista reported that 41% of major cloud outages stemmed from storage and I/O issues, highlighting why disk performance visibility is mission-critical.
Business Risk: Downtime & Latency Impact
When Elasticsearch slows, every dependent service — from log analytics to e-commerce search — feels it. Zenduty’s 2024 data shows an average downtime cost of $14,000 per minute, reinforcing that search latency and indexing failures quickly translate into real revenue loss. Effective monitoring keeps Elasticsearch healthy, user queries fast, and your bottom line protected.
Key Elasticsearch Metrics to Monitor
Monitoring Elasticsearch effectively requires visibility across multiple layers — from JVM and cluster health to queries, indexing, and disk I/O. These metrics form the foundation for diagnosing performance bottlenecks, forecasting capacity, and ensuring cluster stability.
Cluster Health Metrics
These metrics indicate the overall status of your Elasticsearch environment and help detect system-wide issues before they cause downtime.
- Cluster Status (green/yellow/red): Reflects the health of the entire cluster. A green status means all shards are active and assigned; yellow indicates some replicas are unassigned; red means data loss or critical node failure. Threshold: Maintain a consistent green state; investigate immediately if the cluster turns yellow or red.
- Active Shards: Shows how many primary and replica shards are active across nodes. A drop in active shards can signal failed nodes or unbalanced data. Threshold: Keep 100% of primary shards active; replica lag under 5%.
- Unassigned Shards: Tracks shards that haven’t been allocated to any node. High counts can indicate insufficient resources or allocation issues. Threshold: Should ideally remain zero; if rising, check disk and node availability.
- Pending Tasks: Indicates cluster operations waiting to be executed. Growing numbers point to overloaded master nodes or cluster instability. Threshold: Fewer than 50 pending tasks under normal load is recommended.
Node Metrics
Node-level metrics reflect the performance and resource usage of individual nodes, crucial for identifying uneven workloads or failing hardware.
- CPU Utilization: Measures how intensively Elasticsearch threads use CPU resources. High sustained CPU usage slows searches and indexing. Threshold: Keep under 75% average utilization across nodes.
- JVM Heap Usage: Represents the memory consumed by the JVM heap. Excessive usage leads to GC pressure and node unresponsiveness. Threshold: Maintain below 80% heap usage; optimize fielddata and cache size if exceeded.
- Garbage Collection Time: Monitors how often and how long GC events pause the JVM. Frequent long pauses indicate memory pressure or leaks. Threshold: GC pause duration should stay under 100 ms (minor) and 500 ms (major).
- Thread Pool Queue Size: Tracks the size of queues for search, write, and bulk operations. A growing queue means the node can’t process incoming requests fast enough. Threshold: Avoid more than 50 queued tasks per thread pool.
Indexing & Query Metrics
These metrics track how efficiently data is ingested and retrieved, directly impacting user experience and performance.
- Indexing Rate: Indicates the number of documents indexed per second. Drops may suggest ingestion delays or throttling. Threshold: Maintain stable ingestion throughput; sudden drops >20% require investigation.
- Indexing Latency: Measures the time taken to index documents into Elasticsearch. Spikes can result from slow merges or overloaded I/O. Threshold: Keep p95 latency under 50–100 ms for large clusters.
- Search Query Rate: Monitors how many search queries are handled per second. Sudden spikes may stress the cluster or cache. Threshold: Observe trends rather than absolute numbers; correlate spikes with CPU and cache usage.
- Search Latency: Reflects the time Elasticsearch takes to execute search requests. High latency impacts end-user responsiveness. Threshold: Maintain p95 latency below 200 ms for interactive systems.
- Refresh Time: Indicates how quickly new data becomes searchable. Longer refresh times affect real-time analytics use cases. Threshold: Default 1s is typical; investigate if consistently above 5s.
Cache & Memory Metrics
Caches improve query speed by reducing disk reads. Monitoring them ensures efficient memory use and helps avoid heap exhaustion.
- Query Cache Hit Ratio: Measures how often search results are served from cache instead of recalculating. A low ratio reduces performance efficiency. Threshold: Maintain >80% hit ratio in stable workloads.
- Fielddata Memory Usage: Shows heap memory used for fielddata (mainly for aggregations and sorting). Overuse can trigger circuit breakers. Threshold: Keep fielddata under 40% of heap; disable on text fields if unnecessary.
- Cache Evictions: Counts the number of times cached data is removed to free memory. Frequent evictions indicate insufficient heap or poorly tuned caching. Threshold: Eviction rate should stay below 5% of cache size per minute.
Disk I/O & Storage Metrics
Since Elasticsearch relies heavily on disk operations, monitoring storage performance is critical to maintaining stability and throughput.
- Disk Usage per Node: Tracks available disk space. Low free space can cause shard reallocation or index blocking. Threshold: Keep at least 15–20% free disk space on every data node.
- I/O Wait Time: Measures how long threads wait for disk I/O operations. High wait times signal hardware limits or saturation. Threshold: Keep I/O wait under 10% on average.
- Segment Merge Time: Indicates the time Elasticsearch spends merging Lucene segments. Long merges increase indexing latency. Threshold: Keep merge times under 1s per operation under normal conditions.
- Fsync Latency: Reflects how fast Elasticsearch writes data to disk. Prolonged fsync times indicate disk or file system contention. Threshold: Maintain fsync latency under 5 ms for SSD-backed nodes.
Cluster Network & Node Communication Metrics
Elasticsearch nodes communicate frequently to coordinate cluster state, shard replication, and search distribution. Monitoring these metrics ensures nodes stay synchronized.
- Network Latency Between Nodes: Monitors the time it takes for nodes to exchange data. Latency increases replication delay and cluster instability. Threshold: Keep intra-cluster latency under 5 ms for optimal performance.
- Dropped Packets / Connection Errors: Indicates failed or unstable communication between nodes. Persistent errors can cause master node elections and splits. Threshold: Should remain zero under normal conditions.
- Replication Queue Size: Tracks pending replication tasks. Growth suggests overloaded replica nodes or network congestion. Threshold: Maintain less than 50 queued replication tasks under steady load.
How to Perform Elasticsearch Monitoring with CubeAPM
Setting up Elasticsearch monitoring in CubeAPM involves connecting metrics, logs, and traces through OpenTelemetry pipelines. The following step-by-step process explains exactly how developers and SREs can deploy, configure, and validate full observability for Elasticsearch clusters using CubeAPM.
Step 1: Install CubeAPM where your Elasticsearch runs
Deploy CubeAPM in the same network plane as your Elasticsearch cluster (Kubernetes is most common). On Kubernetes, add the Helm repo, pull values.yaml, tune it (storage, resources, base URL), and install or upgrade with Helm. This gets the backend ready to ingest metrics/logs/traces from your Elasticsearch estate.
Step 2: Set essential CubeAPM config (token, base URL, auth)
Before you ingest anything, set the required keys: token and auth.key.session. Then review important parameters like base-url, auth.sys-admins, optional DBs, and timezone. You can set these via CLI args, env vars (prefixed CUBE_), or config file—CubeAPM documents precedence and provides a full reference list you can copy into automation.
Step 3: Expose Elasticsearch metrics via a Prometheus exporter
Export Elasticsearch cluster/node/index metrics using the Prometheus Elasticsearch Exporter (K8s Helm chart or container). The exporter surfaces /metrics (default port 9114 since v1.1.0) which the Collector can scrape. Run one exporter per cluster (or node) and point it at your Elasticsearch HTTP endpoint with auth/CA as needed.
Step 4: Scrape the exporter with OpenTelemetry Collector and forward to CubeAPM
Install the OpenTelemetry Collector and enable the Prometheus receiver to scrape your exporter targets (job_name: elasticsearch, targets: ["<exporter-host>:9114"]). Then add an OTLP/HTTP metrics exporter pointing to CubeAPM’s OTLP metrics endpoint so dashboards and alerts can use the data. (CubeAPM stores Prometheus metrics via the Collector pattern shown below.)
Example (abridged):
- receivers: prometheus with
scrape_configsfor your exporter - exporters:
otlphttp → endpoint: http://<cubeapm>:3130/api/metrics/v1/save/otlp - service: pipeline
metrics: receivers: [prometheus] … exporters: [otlphttp](docs.cubeapm.com)
Step 5: Ship Elasticsearch logs (and system logs) into CubeAPM
Forward Elasticsearch logs for correlation and incident forensics. Use your existing shippers and just switch their “Elasticsearch output” to CubeAPM’s logs ingestion endpoint. For example:
- Logstash →
output { elasticsearch { hosts => ["http://<cubeapm>:3130/api/logs/insert/elasticsearch/"] parameters => { "_msg_field" => "message" "_time_field" => "@timestamp" "_stream_fields" => "host.name,process.name" }}} - Filebeat →
output.elasticsearch.hosts: ["http://<cubeapm>:3130/api/logs/insert/elasticsearch/"] with _msg_field, _time_field, _stream_fieldskeys. CubeAPM documents the ingestion endpoints and reserved fields (_msg, _time, stream fields) in detail.
Step 6: Add traces from services that call Elasticsearch (OpenTelemetry)
Instrument your applications (Java, .NET, Node.js, Python, Go, PHP, Ruby, etc.) with OpenTelemetry so you can link slow Elasticsearch queries to upstream requests and users. CubeAPM is OTLP-native, and the language guides show the exact environment variables, e.g.,
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://<cubeapm>:4318/v1/traces and OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://<cubeapm>:3130/api/metrics/v1/save/otlp.
This gives you MELT: metrics (from the exporter), logs (from shippers), and traces (from your apps).
Step 7: Create Elasticsearch-focused dashboards & SLO alerts
Now build dashboards around the metrics that actually break clusters: old-gen heap/GC, unassigned shards, thread-pool rejections, query latency, cache hit ratio, merge/refresh/fsync latencies, and disk I/O wait.
Wire up alerting (start with email) so on-call sees issues as they form: configure smtp.url and smtp.from in CubeAPM, then create alerts tied to those metrics (e.g., cluster status ≠ green, heap >80%, p95 search latency > SLO). (docs.cubeapm.com)
Step 8: Validate end-to-end and harden
Confirm exporter metrics are arriving (look for your job_name), logs are searchable with _time and _msg, and traces show Elasticsearch spans in request waterfalls. If you run on Kubernetes or VMs, add broader Infra Monitoring to correlate node CPU/memory/disk with Elasticsearch behavior; CubeAPM’s infra section covers K8s, bare-metal/VMs, CloudWatch, and Prometheus pipelines. Tighten retention and sampling once the views are stable.
Real-World Example: Elasticsearch Monitoring with CubeAPM
Challenge
A global e-commerce company relied on Elasticsearch to power real-time product search, pricing analytics, and customer behavior tracking. During high-traffic events like holiday sales, search response times spiked from 120 ms to over 2 seconds, and the team observed frequent cluster instability.
The internal monitoring stack based on Kibana and Prometheus couldn’t pinpoint whether the bottleneck came from JVM heap pressure, unbalanced shards, or disk latency. Every Black Friday, engineers had to manually scale clusters and restart nodes — often after customers already experienced slow search results.
Solution
The SRE team deployed CubeAPM to unify Elasticsearch metrics, logs, and traces in a single OpenTelemetry-native dashboard. They used the Elasticsearch Prometheus Exporter to expose node-level metrics and configured the OpenTelemetry Collector to push those into CubeAPM. The team also integrated Filebeat to ship Elasticsearch logs and instrumented upstream APIs with OpenTelemetry SDKs to trace end-to-end latency.
This setup allowed them to correlate search latency spikes with GC pauses, shard relocations, and thread-pool rejections across nodes — something not visible before.
Fixes Implemented
Using CubeAPM’s Elasticsearch dashboards, engineers identified that old-generation heap usage consistently exceeded 80% during peak indexing, triggering long garbage collection pauses. They optimized heap sizing, limited fielddata caching on text fields, and adjusted shard allocation to evenly distribute write-heavy indices. Alerts were configured for:
- JVM heap > 80%
- Cluster status ≠ green
- Search latency > 200 ms
- Pending tasks > 50
Within a week, CubeAPM’s smart sampling and Slack alert integration helped the team proactively mitigate performance degradation before it reached customers.
Result
After implementation, the company achieved a 68% reduction in average search latency and eliminated cluster red-state incidents during peak hours. Query throughput improved by 40%, and incident response time dropped from 45 minutes to under 5 minutes thanks to CubeAPM’s real-time alerting.
With predictable $0.15/GB ingestion cost and unified MELT observability, the organization now manages its Elasticsearch performance seamlessly — without the manual firefighting that once defined their peak seasons.
Verification Checklist for Elasticsearch Monitoring with CubeAPM
Once metrics, logs, and traces are wired into CubeAPM, validate end-to-end coverage and attach alerts to the Elasticsearch signals that actually break clusters. Use this checklist to confirm visibility, then enable a small set of high-signal alerts.
- Metrics pipeline healthy: Ensure the Prometheus Elasticsearch Exporter and OTel Collector are up and sending data to CubeAPM; check for fresh timestamps in dashboards.
- Cluster state visible: Confirm cluster status, unassigned shards, and pending tasks update in real time when performing maintenance or restarts.
- JVM telemetry complete: Verify heap usage, GC duration, and circuit breaker trips are being captured to catch memory pressure early.
- Index and search metrics populated: Check that indexing rate, search latency, and refresh times display correct trends during test queries.
- Cache and disk metrics active: Confirm query cache hit ratio, fielddata memory, disk free%, and I/O wait appear across all nodes.
- Logs & traces connected: Make sure Elasticsearch logs are searchable and traces from upstream services display Elasticsearch spans for correlation.
- Alert validation: Trigger a sample condition (e.g., heap > 10%) and confirm alert notifications reach your configured email or channel.
Example Alert Rules for Elasticsearch monitoring with CubeAPM
1. Cluster status not green (degraded availability)
If the cluster drops to yellow/red (e.g., unassigned replicas or worse), page the on-call immediately.
# PromQL (via elasticsearch_exporter)
elasticsearch_cluster_health_status{color!="green"} > 0
FOR: 2m
LABELS: severity="critical", service="elasticsearch"
ANNOTATIONS: summary="Elasticsearch cluster status is {{ $labels.color }}", runbook="SRE: check unassigned shards & pending tasks"2. JVM heap pressure (risk of long GC pauses / OOM)
Alert when average heap usage per node stays high; this is a leading indicator for latency spikes and breaker trips.
# PromQL example (heap used / heap max)
avg by (node)(elasticsearch_jvm_memory_used_bytes{area="heap"}
/ ignoring(area) elasticsearch_jvm_memory_max_bytes{area="heap"}) > 0.80
FOR: 5m
LABELS: severity="warning", service="elasticsearch"
ANNOTATIONS: summary="High heap usage >80% on {{ $labels.node }}", tip="Check fielddata/query cache size; review GC and hot indices"3. Search thread-pool saturation (user-visible timeouts likely)
Fire when search queue builds up or rejections start, signalling capacity saturation.
# PromQL examples (choose one or use both)
max by (node)(elasticsearch_threadpool_queue{pool="search"}) > 50
OR
rate(elasticsearch_threadpool_rejected_count{pool="search"}[5m]) > 0
FOR: 3m
LABELS: severity="critical", service="elasticsearch"
ANNOTATIONS: summary="Search thread-pool saturation on {{ $labels.node }}", tip="Scale out, reduce shards per node, or throttle expensive queries"(Tune thresholds to your baseline and SLOs—e.g., set p95 search latency >200 ms, pending cluster tasks >50, or I/O wait >10%—and attach runbook links so responders know the first actions to take.)
Conclusion
Monitoring Elasticsearch is essential for ensuring performance, reliability, and real-time visibility across distributed clusters. As search and analytics workloads scale, unmonitored issues like JVM pressure, unassigned shards, and disk I/O bottlenecks can quickly escalate into downtime and user-impacting latency.
CubeAPM delivers a unified, OpenTelemetry-native platform to monitor Elasticsearch metrics, logs, and traces with complete context. Its smart sampling, flat-rate pricing, and real-time dashboards empower SREs to detect issues early and correlate root causes without complexity or vendor lock-in.
Start simplifying your Elasticsearch monitoring today, deploy CubeAPM and experience transparent observability, faster debugging, and scalable performance visibility across your entire search infrastructure.