CubeAPM
CubeAPM CubeAPM

Elasticsearch/OpenSearch Monitoring: Shard Sizing, Heap Pressure, and Cluster Cost Optimization

Elasticsearch/OpenSearch Monitoring: Shard Sizing, Heap Pressure, and Cluster Cost Optimization

Table of Contents

Elasticsearch and OpenSearch clusters fail in predictable ways. A shard sized at 80 GB instead of 30 GB can turn a 2-minute recovery into a 40-minute outage. Heap usage above 75% triggers cascading garbage collection pauses that lock writes across the entire cluster. And on AWS OpenSearch Service, the default 5 primary shards per index means a 20-node cluster can hit the 1,000-shard-per-node limit before you finish indexing a year of log data.

According to AWS, clusters with more than 20-25 shards per GiB of JVM heap become unstable. That constraint alone determines how you design retention, shard count, and node sizing before your first query runs. This guide covers how to monitor shard health, detect heap pressure before it cascades, and optimize cluster cost without degrading search performance.

What Is Elasticsearch and OpenSearch Cluster Monitoring

Elasticsearch and OpenSearch are distributed search and analytics engines built on Apache Lucene. Both store data in indices, which are split into primary shards distributed across nodes. Each shard is a self-contained Lucene index. Monitoring these clusters means tracking shard distribution, JVM heap utilization, query latency, indexing throughput, node CPU and disk pressure, and circuit breaker trips that prevent the cluster from accepting new requests.

The goal is not passive observation. Monitoring surfaces the moment a shard grows too large, heap usage climbs toward the danger zone, or a query pattern begins degrading cluster performance. Without it, you discover problems only when users report slow search results or the cluster refuses new writes.

Both Elasticsearch and OpenSearch expose metrics via REST APIs, including cluster health, node stats, index stats, and shard allocation state. Tools like Prometheus exporters, Metricbeat, or native integrations pull this telemetry and make it queryable. The challenge is not getting the data but understanding which signals predict failure and which are noise.

How Shard Sizing Affects Cluster Stability and Recovery Time

Elasticsearch and OpenSearch replicate data by copying entire shards. If a node fails, the cluster must copy the missing primary shards from replicas on surviving nodes. A 10 GB shard replicates in seconds. A 100 GB shard can take 10 minutes or longer depending on network speed and disk IO.

AWS recommends keeping shard size between 10-30 GB for search-heavy workloads and 30-50 GB for write-heavy use cases like log aggregation. The reason: smaller shards distribute IO across more Lucene segments, reducing the likelihood that a single slow query locks an entire shard. Smaller shards also mean faster recovery when nodes fail.

But shard count is not free. Every shard consumes memory for metadata, file handles, and indexing buffers. Elasticsearch 7.x and OpenSearch up to 2.15 have a default limit of 1,000 shards per node. OpenSearch 2.17 and later allow up to 4,000 shards per node, but only if you have sufficient heap. The rule: no more than 20-25 shards per GiB of JVM heap.

The calculation to estimate shard count:

(Source data + expected growth) * (1 + indexing overhead) / target shard size = primary shard count

Indexing overhead accounts for replicas and Lucene segment metadata. A 10% overhead factor is typical, meaning 1.1 in the formula. If you have 200 GB of data today, expect it to grow to 600 GB over the next year, and want 30 GB shards:

(200 + 400) * 1.1 / 30 = 22 primary shards

But 22 shards for 200 GB of current data means each shard is only 9 GB today. That might be too small and waste resources. A middle ground approach: start with 8 primary shards, giving 25 GB per shard now and 75 GB per shard at full scale. You can reindex later if shards grow past 50 GB.

The cost of getting this wrong: a Reddit user on r/elasticsearch documented a cluster that used 5 primary shards for every small index. The cluster had hundreds of indices, most under 1 GB. Total shard count exceeded 10,000. Heap usage sat at 85% constantly. Query latency doubled. The fix required reindexing everything into fewer, larger indices.

Monitoring Shard Size in Production

Track shard size per index using the _cat/shards API. This returns shard size in bytes, which node hosts each shard, and whether the shard is primary or replica. Export this to a time-series database and alert when any shard exceeds 50 GB for search workloads or 80 GB for log ingest.

You also need to monitor shard count per node. The _cat/allocation API shows shard distribution. If shard count is unbalanced across nodes, the cluster will not scale evenly. Some nodes will hit resource limits while others sit idle.

For OpenSearch on AWS, you cannot easily change primary shard count after index creation. The only path is reindexing. Plan shard count before the first document is indexed.

Heap Pressure: Why JVM Memory Matters More Than Disk Space

Elasticsearch and OpenSearch run on the JVM. Heap memory stores index metadata, query caches, fielddata for aggregations, and in-flight requests. When heap usage climbs above 75%, the JVM spends more time in garbage collection than executing queries. Above 85%, write operations slow to a crawl. Above 92%, the cluster may stop accepting new data entirely.

AWS OpenSearch Service sets a hard JVM heap limit of 32 GB per node. Beyond that, the JVM’s memory model becomes inefficient. Larger nodes do not get more heap, they get more CPU and disk. That 32 GB heap limit determines how many shards you can safely run on a single node.

The heap pressure problem compounds with shard count. Each shard maintains its own segment metadata, file handles, and indexing buffers in heap. More shards mean more overhead. A cluster with 2,000 shards spread across 50 nodes uses heap differently than one with 200 shards on the same 50 nodes. The first cluster will show higher baseline heap usage even with identical data volume.

How to Monitor Heap Usage Safely

Track heap usage via the _nodes/stats API. The response includes jvm.mem.heap_used_percent for each node. Export this metric and alert at 70% sustained usage. Do not wait until 85%. By that point, garbage collection pauses are already degrading performance.

Also monitor garbage collection frequency and duration. The jvm.gc.collectors.old.collection_time_in_millis metric shows how much time the JVM spends in full garbage collection. If this number climbs steadily, heap pressure is building. The cluster will eventually lock up.

Circuit breaker trips are another leading indicator. Elasticsearch and OpenSearch use circuit breakers to prevent out-of-memory errors. When a query or aggregation tries to allocate more memory than available, the circuit breaker trips and rejects the request. Track the breakers.parent.tripped metric. If this fires frequently, heap is undersized for your query patterns.

A common mistake: teams scale horizontally by adding more nodes but do not reduce shard count per node. Heap pressure persists because each node still hosts too many shards. The fix is not more nodes but fewer, larger shards.

Common Causes of Heap Pressure in Elasticsearch and OpenSearch

Heap exhaustion happens for predictable reasons. The most common:

Too many shards per node. As covered earlier, each shard consumes heap. If shard count per node exceeds 20-25 per GiB of heap, baseline memory usage will be high even with no active queries.

Large aggregations. Aggregations that compute unique values across millions of documents load data into heap. A terms aggregation with size set to 10,000 means the JVM must track up to 10,000 unique buckets in memory. If that aggregation runs across a 100 GB index, heap usage spikes. The solution: reduce aggregation size or use composite aggregations to paginate results.

Fielddata for sorting or aggregating on text fields. By default, text fields are analyzed and not loaded into memory. But if you enable fielddata on a text field for sorting, Elasticsearch loads the entire inverted index into heap. This can consume gigabytes of memory instantly. Use keyword fields instead for sorting and aggregations.

Too many concurrent queries. Each query allocates memory for intermediate results, query caches, and filter contexts. If 500 queries hit the cluster simultaneously, heap usage spikes. The fix: rate-limit queries at the application layer or increase node count to distribute load.

Index refresh interval set too low. Elasticsearch and OpenSearch refresh indices every second by default, making new documents searchable. Each refresh creates a new segment and consumes heap. If you set refresh interval to 100ms for near real-time search, segment creation happens 10 times per second. Heap pressure increases. For bulk ingest workloads, set refresh interval to 30 seconds or disable it entirely during indexing.

An engineer on r/devops documented heap pressure caused by overly aggressive refresh intervals. Heap usage dropped from 82% to 55% after changing refresh interval from 1 second to 10 seconds. Query latency did not degrade because most queries were for historical data, not the most recent 10 seconds.

How to Right-Size Shards Based on Data Growth

Shard sizing is not static. Data grows. Retention policies change. Query patterns evolve. The shard count that worked for 100 GB of logs will not work for 1 TB.

Start with the formula:

(Current data + expected growth) * 1.1 / target shard size = primary shard count

But adjust for these realities:

If data grows predictably, plan for the end state now. If you expect 1 TB of data in 12 months and you want 40 GB shards, create 27 primary shards today. Yes, each shard will start small, around 3-4 GB. That is acceptable because you avoid reindexing later.

If data grows unpredictably, start with fewer shards and reindex when needed. For example, start with 5 primary shards. When shards reach 50 GB, reindex into a new index with 10 primary shards. This requires downtime or an index alias swap, but it prevents over-sharding early.

For time-series data, use index templates with rollover. Instead of sizing shards for the total dataset, size them for the rollover interval. If you roll over daily and expect 50 GB per day, create 2 primary shards per daily index. Each shard will be 25 GB, which is optimal for search and recovery.

For append-only workloads like logs, prefer larger shards. Writes are sequential. Recovery is rare. A 50 GB shard is fine. For workloads with frequent updates and deletes, prefer smaller shards around 20-30 GB because segment merges happen more often.

AWS OpenSearch Service does not let you change primary shard count after index creation. You must reindex into a new index with the correct shard count. That process is slow and requires twice the disk space temporarily. Plan shard count carefully before the first document is indexed.

Index Lifecycle Management and Shard Optimization

Index Lifecycle Management (ILM) in Elasticsearch and Index State Management (ISM) in OpenSearch automate index transitions through hot, warm, cold, and delete phases. This is critical for cost and shard optimization.

The hot phase stores recent, frequently searched data on fast SSD-backed nodes with high CPU. The warm phase stores older, less frequently searched data on slower nodes with more disk. The cold phase stores rarely accessed data on the cheapest storage tier. Shards in the cold phase can be larger because search performance is less critical.

Force merge is a key optimization during phase transitions. When an index moves from hot to warm, force merge reduces segment count to 1. This frees heap because fewer segments mean less metadata overhead. But force merge is CPU and IO intensive. Run it only on indices that are no longer being written to.

A typical ILM policy for logs:

{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50GB",
            "max_age": "1d"
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "forcemerge": {
            "max_num_segments": 1
          },
          "shrink": {
            "number_of_shards": 1
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "freeze": {}
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

This policy rolls over daily or when an index reaches 50 GB, moves indices to warm after 7 days and reduces shard count to 1, freezes indices after 30 days, and deletes them after 90 days. The shrink action reduces primary shard count, which lowers heap usage.

Shrink works only on read-only indices. It creates a new index with fewer shards and copies data from the old index. Disk space temporarily doubles during shrink. But the result is lower heap usage and faster queries because fewer shards need to be searched.

Cluster Cost Optimization: Balancing Node Size, Shard Count, and Retention

Elasticsearch and OpenSearch cluster cost breaks into three components: compute (node instance cost), storage (disk attached to nodes), and data transfer (ingest and query egress). Most teams overspend on compute because they over-provision for peak load or run too many small nodes instead of fewer large nodes.

The first lever is node size. Larger nodes are more cost efficient per GiB of heap and per CPU core. An AWS r6g.2xlarge instance costs $0.504/hour and provides 64 GB RAM with 32 GB heap. An r6g.xlarge costs $0.252/hour with 32 GB RAM and 16 GB heap. You need two xlarge nodes to match one 2xlarge node in heap, but two xlarge nodes cost the same and add operational complexity. Use fewer, larger nodes unless you need granular horizontal scaling.

The second lever is shard count. More shards mean higher baseline resource usage. A cluster with 500 primary shards and 500 replicas consumes more heap and CPU than a cluster with 100 primary shards and 100 replicas holding the same data. Right-sizing shards reduces node count.

The third lever is retention. Every day of retention adds storage cost and increases shard count if you use daily indices. If you retain 90 days of logs and roll over daily, you have 90 indices. If each index has 5 primary shards, that is 450 primary shards. Reducing retention to 30 days cuts shard count to 150.

For workloads where long retention is required, use tiered storage. Move old indices to cold nodes with high-capacity HDD instead of SSD. AWS OpenSearch Service UltraWarm uses S3-backed storage at a fraction of SSD cost. But queries against UltraWarm are slower. Do not use it for indices queried frequently.

A real example from a SaaS platform running OpenSearch: they started with 30 m5.large nodes (2 vCPU, 8 GB RAM each) costing $3,600/month. Heap usage was 70%. They migrated to 12 r6g.xlarge nodes (4 vCPU, 32 GB RAM each) costing $2,880/month. Heap usage dropped to 50% because shard count per node decreased. Query latency improved. They saved $720/month by running fewer, larger nodes.

OpenSearch vs Elasticsearch: Does Monitoring Differ?

OpenSearch forked from Elasticsearch 7.10.2 in 2021. The core architecture is identical. Sharding, heap management, indexing, and search work the same way. Monitoring strategies for Elasticsearch apply to OpenSearch with minimal changes.

The main differences:

Shard limits. Elasticsearch 7.x has a 1,000-shard-per-node limit by default. OpenSearch 2.17 and later allow up to 4,000 shards per node if heap is sufficient. The 20-25 shards per GiB heap rule still applies.

REST API endpoints. Most APIs are identical. The _cat, _cluster, and _nodes endpoints work the same. But Elasticsearch 8.x introduced some changes to security and query DSL that are not present in OpenSearch. If you are monitoring Elasticsearch 8.x, check the official docs for endpoint differences.

AWS OpenSearch Service specifics. AWS manages the cluster but limits access to some APIs. You cannot SSH into nodes. You cannot change JVM flags. You cannot disable the _all field. These restrictions affect how you monitor and tune the cluster.

If you run self-hosted Elasticsearch or OpenSearch, monitoring setup is identical. If you use AWS OpenSearch Service, you must rely on CloudWatch metrics for node-level data and use the OpenSearch REST APIs for shard and index stats.

Tools for Monitoring Elasticsearch and OpenSearch Clusters

Most teams monitor Elasticsearch and OpenSearch using one of these approaches:

Elastic Stack (Elasticsearch + Kibana + Metricbeat). Metricbeat collects cluster metrics via the Elasticsearch APIs and indexes them into a dedicated monitoring index. Kibana visualizes the data. This approach is free but requires running a separate Elasticsearch cluster for monitoring data. If the production cluster fails, you lose monitoring.

Prometheus + Grafana + Elasticsearch Exporter. The Elasticsearch exporter scrapes metrics from the _nodes/stats and _cluster/health APIs and exposes them in Prometheus format. Prometheus stores the time-series data. Grafana visualizes it. This is the most common approach for teams already using Prometheus. But you must build dashboards and alerts yourself.

AWS CloudWatch (for AWS OpenSearch Service). CloudWatch automatically collects cluster-level metrics like CPU, heap usage, and JVM pressure. But it does not surface shard-level details or per-index stats. You need to supplement CloudWatch with custom queries to the OpenSearch REST APIs.

CubeAPM. CubeAPM provides unified observability for Elasticsearch and OpenSearch clusters alongside application traces, logs, and infrastructure metrics. It integrates with OpenTelemetry, Prometheus, and Elastic agents to collect cluster telemetry without requiring a separate monitoring stack. Teams running CubeAPM can query shard health, heap pressure, and indexing throughput in the same interface they use for application performance monitoring. For self-hosted Elasticsearch or OpenSearch, CubeAPM eliminates the need to maintain a second monitoring cluster. Pricing is predictable at $0.15/GB of ingested telemetry data, covering all signal types: metrics, logs, traces, and cluster stats.

Setting Up Monitoring for Elasticsearch and OpenSearch

A production-ready monitoring setup tracks these signals:

Cluster health. The _cluster/health API returns red, yellow, or green status. Red means at least one primary shard is missing. Yellow means replicas are missing. Green means all shards are allocated. Alert on yellow status for more than 5 minutes and red status immediately.

Heap usage per node. The _nodes/stats API returns jvm.mem.heap_used_percent. Alert at 70% sustained usage for 10 minutes. Investigate immediately at 85%.

Shard count per node. The _cat/allocation API shows shard count by node. Alert if any node exceeds the 20-25 shards per GiB heap guideline.

Shard size per index. The _cat/shards API shows shard size in bytes. Alert when any shard exceeds 50 GB for search workloads or 80 GB for log aggregation.

Circuit breaker trips. The _nodes/stats API includes breakers.parent.tripped. Alert on any increase. Each trip means a query or aggregation was rejected due to insufficient heap.

Query latency. Track search latency via application-side instrumentation or by parsing Elasticsearch slow logs. Alert when p95 latency exceeds your SLA.

Indexing throughput. The _nodes/stats API includes indices.indexing.index_total and indices.indexing.index_time_in_millis. Calculate docs per second and indexing time per doc. Alert on sudden drops in throughput.

Disk usage per node. The _nodes/stats API includes fs.total.available_in_bytes. Alert at 85% disk usage. Elasticsearch stops allocating shards when disk usage exceeds 90%.

JVM garbage collection time. The _nodes/stats API includes jvm.gc.collectors.old.collection_time_in_millis. Alert when GC time per minute exceeds 1 second sustained for 10 minutes. This indicates heap pressure.

Thread pool rejections. The _nodes/stats API includes thread_pool.search.rejected and thread_pool.write.rejected. Rejections mean the cluster is overloaded. Alert on any increase.

For alerting, use Prometheus Alertmanager, Grafana alerts, or CloudWatch alarms depending on your stack. Route critical alerts to PagerDuty or Slack. Route warnings to email or a dedicated ops channel.

Best Practices for Elasticsearch and OpenSearch Monitoring

Monitor shard health continuously, not just cluster health. Cluster health can be green even if individual shards are undersized or poorly distributed. Track shard count per node and shard size per index.

Set heap alerts at 70%, not 85%. By the time heap usage reaches 85%, the cluster is already degrading. Alert early and scale proactively.

Use composite aggregations for large cardinality queries. If you need to aggregate millions of unique values, do not set size to 10,000. Use composite aggregations to paginate results and avoid heap spikes.

Run force merge only on read-only indices. Force merge on actively written indices causes IO storms and degrades write performance. Use ILM or ISM to force merge after an index rolls over.

Disable replicas during bulk indexing. Replicas double write load. If you are reindexing a large dataset, set number_of_replicas to 0 during indexing and re-enable replicas after.

Monitor disk usage per node, not just cluster-wide. One node filling up can trigger shard relocation storms that degrade the entire cluster. Alert per-node at 85% disk usage.

Track circuit breaker trips as a leading indicator of heap exhaustion. If breakers trip frequently, heap is undersized or query patterns are inefficient. Investigate before the cluster locks up.

Use dedicated master nodes for clusters with more than 10 data nodes. Master nodes handle cluster state and shard allocation. Separating masters from data nodes prevents shard allocation delays during high query load.

Disclaimer: The information in this article reflects the latest details available at the time of publication and may change as technologies and products evolve. Features, pricing, and plan limits can change over time. Always verify the latest information directly with the vendor before making purchasing or deployment decisions.

Frequently Asked Questions

What is the optimal shard size for OpenSearch?

AWS recommends 10-30 GB per shard for search-heavy workloads and 30-50 GB for write-heavy workloads like log aggregation. Smaller shards recover faster after node failures but increase metadata overhead.

Does Elasticsearch support sharding?

Yes, Elasticsearch and OpenSearch both use sharding to distribute data across nodes. Each index is split into primary shards, and each primary can have one or more replicas for redundancy.

What is cost optimization in OpenSearch?

Cost optimization means running the smallest cluster that meets performance and availability requirements. This involves right-sizing nodes, minimizing shard count, using tiered storage for old data, and reducing retention where possible.

How many shards per node is safe in Elasticsearch?

No more than 20-25 shards per GiB of JVM heap. For a node with 32 GB heap, that means 640-800 shards maximum. Exceeding this limit increases baseline heap usage and garbage collection frequency.

What causes heap pressure in Elasticsearch?

Too many shards per node, large aggregations, fielddata on text fields, too many concurrent queries, and aggressive index refresh intervals all increase heap usage.

Can I change shard count after creating an index?

No, primary shard count is fixed at index creation in both Elasticsearch and OpenSearch. You must reindex into a new index with the correct shard count.

Should I use one large node or multiple small nodes?

Fewer, larger nodes are more cost efficient per GiB of heap and per CPU core. Use multiple small nodes only if you need granular horizontal scaling or failover isolation.

×
×