ScyllaDB’s shard-per-core architecture and C++ runtime eliminate the garbage collection pauses and JVM overhead that cause latency spikes in Cassandra. But ScyllaDB has its own failure modes, and they are not always loud. Compaction can quietly saturate disk I/O while reads slow down gradually. Tombstone accumulation from frequent deletes increases read amplification without triggering any error. A single overloaded shard can spike tail latency on a node that looks healthy from the outside. Background workers can exhaust their scheduling group while your median latency stays flat.
This guide covers the monitoring areas that matter most for ScyllaDB in production: coordinator and replica-level latency, compaction health, tombstone pressure, reactor utilization, and setting up alerts so these problems surface before they become incidents.
What Makes ScyllaDB Monitoring Different from Cassandra
ScyllaDB introduces three monitoring concepts that have no direct equivalent in Cassandra.
- Shard-per-core architecture: ScyllaDB assigns each CPU core its own memory, network queues, and storage I/O. A query is routed to a specific shard and processed without cross-core locking. This means a single overloaded shard can spike latency on a node even when aggregate CPU utilization looks normal. Monitoring must track per-shard reactor utilization, not just node-level CPU.
- Two latency surfaces: ScyllaDB exposes latency at two layers. Coordinator latency (scylla_storage_proxy_coordinator_*) reflects the total time from when the client request arrives at the coordinator node to when the response is sent back, including replication. Replica latency (scylla_storage_proxy_replica_*) reflects the time spent on each replica processing the request. Both must be monitored to distinguish coordinator-side bottlenecks from replica-side ones.
- Compaction backlog as a first-class metric: ScyllaDB exposes a compaction backlog metric that normalizes pending compaction work against available shard memory. A growing normalized backlog is an early warning of write amplification before disk saturation occurs. Cassandra exposes pending tasks but not a normalized backlog signal.
Step 1: Verify ScyllaDB’s Prometheus Metrics Endpoint
ScyllaDB exposes all metrics in Prometheus format on port 9180 by default. Verify it is reachable:
curl http://<scylla-node-ip>:9180/metrics | head -50You should see metric families with descriptions. The full metrics reference is available at http://<scylla-node-ip>:9180/metrics on any running node and in the official ScyllaDB documentation at docs.scylladb.com/manual/stable/reference/metrics.html.
ScyllaDB also ships a pre-built monitoring stack (Prometheus + Grafana) via the ScyllaDB Monitoring Stack project. The Prometheus queries in this guide work with both the official monitoring stack and any external Prometheus-compatible backend, including CubeAPM.
Step 2: Monitor Coordinator Latency
Coordinator latency is the primary latency signal your application experiences. It covers the full round trip from request arrival at the coordinator to response sent to the client, including replication across all configured replicas.
Key Prometheus metrics:
| Metric | Type | Description |
| scylla_storage_proxy_coordinator_read_latency | histogram | Read request latency at the coordinator, in microseconds |
| scylla_storage_proxy_coordinator_write_latency | histogram | Write request latency at the coordinator, in microseconds |
| scylla_storage_proxy_coordinator_range_latency | histogram | Range scan latency at the coordinator, in microseconds |
To compute p99 read latency across all shards on a node (PromQL):
histogram_quantile(
0.99,
sum(rate(scylla_storage_proxy_coordinator_read_latency_bucket[60s]))
by (instance, le)
)To compute p99 write latency:
histogram_quantile(
0.99,
sum(rate(scylla_storage_proxy_coordinator_write_latency_bucket[60s]))
by (instance, le)
)To break down latency by shard (useful when a single shard is causing tail latency):
histogram_quantile(
0.99,
sum(rate(scylla_storage_proxy_coordinator_write_latency_bucket[60s]))
by (instance, shard, le)
)Latency thresholds to alert on (all values in microseconds, adjust to your SLO):
| Percentile | Warning | Critical |
| p50 read latency | > 1,000 µs | > 5,000 µs |
| p99 read latency | > 10,000 µs | > 50,000 µs |
| p50 write latency | > 500 µs | > 2,000 µs |
| p99 write latency | > 5,000 µs | > 20,000 µs |
Step 3: Check Coordinator Latency from the Command Line
nodetool proxyhistograms provides coordinator latency percentiles in real time directly from the command line, without requiring Prometheus:
nodetool proxyhistogramsExample output:
proxy histograms
Percentile Read Latency Write Latency Range Latency
(micros) (micros) (micros)
50% 353.50 1214.50 4103.00
75% 972.50 2969.25 5073.25
95% 4832.85 15394.80 14981.50
99% 8356.63 21873.00 31843.79
Min 22.00 207.00 499.00
Max 8365.00 21873.00 208960.00This is the fastest way to spot a latency regression during an active incident. If p99 read latency has spiked, follow up with per-table diagnostics using nodetool cfhistograms.
Step 4: Check Per-Table Latency
nodetool cfhistograms (also callable as nodetool tablehistograms) provides per-table latency histograms including read/write latency, SSTable count, partition size, and column count, covering all operations since the last time the command was run:
nodetool cfhistograms <keyspace> <table_name>nodetool tablestats (previously nodetool cfstats) provides a summary of read and write latency averages, SSTable count, bloom filter false positive ratio, and tombstone counts per table:
nodetool tablestats <keyspace>/<table_name>Key fields to read in tablestats output:
| Field | What to watch for |
| Read Latency | Average read latency in ms; rising average with low p50 suggests outliers |
| SSTable count | High count (> 20 per level for LCS, > 4 for STCS) indicates compaction is not keeping up |
| Tombstone fields | High live tombstone counts indicate accumulation that will slow reads |
| Bloom filter false positives | High false positive rate means extra disk reads on every query |
Step 5: Monitor Compaction Health
Compaction in ScyllaDB merges SSTables and reclaims space from deleted data. When compaction falls behind, SSTable count grows, read amplification increases, and latency climbs. ScyllaDB exposes compaction health via Prometheus metrics and the nodetool compactionstats command.
Key Prometheus compaction metrics:
| Metric | Type | Description |
| scylla_compaction_manager_compactions | gauge | Currently active compactions |
| scylla_compaction_manager_pending_compactions | gauge | Compaction tasks waiting to run |
| scylla_compaction_manager_completed_compactions | counter | Total completed compaction tasks |
| scylla_compaction_manager_failed_compactions | counter | Total failed compaction tasks |
| scylla_compaction_manager_postponed_compactions | gauge | Tables with postponed compaction |
| scylla_compaction_manager_backlog | gauge | Sum of compaction backlog across all tables |
| scylla_compaction_manager_normalized_backlog | gauge | Backlog normalized by shard available memory |
The normalized_backlog metric is the most important early warning signal. It divides compaction backlog by the shard’s available memory, producing a unitless ratio. When it exceeds 1.0 across multiple shards, compaction is under sustained pressure.
To check active compactions from the command line:
nodetool compactionstatsTo view compaction history:
nodetool compactionhistoryCompaction alert thresholds:
| Signal | Warning | Critical |
| pending_compactions | > 100 | > 500 |
| normalized_backlog | > 0.5 | > 1.0 |
| failed_compactions rate | Any increase | Rapid increase |
| postponed_compactions | > 5 tables | > 20 tables |
When compaction is consuming too much CPU and impacting read latency, throttle it using compaction_static_shares in scylla.yaml. A value of 100 is a conservative starting point:
compaction_static_shares: 100Increase the value if read latency worsens due to compaction being too slow, decrease it if compaction is saturating the CPU. The valid range is 50 to 1000.
Step 6: Monitor Reactor Utilization Per Shard
Reactor utilization measures how busy each Seastar reactor (one per CPU core/shard) is. A reactor at 100% utilization cannot process new requests on that shard without queuing them, which causes latency spikes even when aggregate CPU looks normal.
Key Prometheus metric:
| Metric | Type | Description |
| scylla_reactor_utilization | gauge | Fraction of time the reactor was busy (0.0 to 1.0), per shard |
To alert on reactor utilization above 80% on any shard (PromQL):
scylla_reactor_utilization > 0.8To find the most loaded shard across the cluster:
topk(5, scylla_reactor_utilization)Reactor utilization thresholds:
| Utilization | Status |
| < 0.5 | Healthy |
| 0.5 to 0.8 | Monitor for upward trend |
| > 0.8 | Warning: latency tail risk |
| > 0.95 | Critical: shard likely queueing requests |
A persistently overloaded shard is usually caused by a hot partition routing disproportionate traffic to one shard. Use nodetool toppartitions to identify which partitions are generating the most reads and writes:
nodetool toppartitions <keyspace> <table> <duration_ms>Step 7: Monitor Tombstone Pressure
Tombstones are markers ScyllaDB writes to record deletions. They accumulate until compaction removes them. A high tombstone count on a partition increases read amplification because every read must scan and filter tombstones before returning live data.
ScyllaDB uses two separate configuration parameters to control tombstone behavior. tombstone_warn_threshold controls when a warning is logged during a read (default is 0, meaning warnings are disabled). query_tombstone_page_limit sets the hard limit at which ScyllaDB aborts the query, defaulting to 10,000 tombstones per page. When this limit is breached, ScyllaDB throws the error: Tombstones processed by unpaged query exceeds limit of 10000 (configured via query_tombstone_page_limit).
To enable tombstone warnings before the abort threshold is reached, set tombstone_warn_threshold in scylla.yaml:
tombstone_warn_threshold: 1000
query_tombstone_page_limit: 10000Monitor tombstone counts per table via nodetool tablestats:
nodetool tablestats <keyspace>/<table_name>Look for the Tombstone section in the output. A high live tombstone count with slow growth means compaction is keeping up. Rapidly growing tombstone counts alongside a rising SSTable count means compaction is behind and deletions are accumulating faster than they are being reclaimed.
Tombstone alert thresholds:
| Signal | Warning | Critical |
| Live tombstone ratio (tombstones / total cells) | > 10% | > 30% |
| tombstone_warn_threshold breaches in ScyllaDB logs | Any occurrence | Repeated occurrence |
| query_tombstone_page_limit abort errors in logs | Any occurrence | Investigate immediately |
| SSTable count growth with rising tombstones | Monitor trend | Investigate immediately |
Step 8: Set Up Alerts with CubeAPM

Running these checks manually is a start, but production ScyllaDB clusters need automated alerting that fires before latency degrades or compaction spirals. CubeAPM connects to ScyllaDB’s Prometheus endpoint on port 9180, scrapes all metrics above at configurable intervals, and lets you define alert thresholds with per-shard granularity across every node in the cluster.
Because CubeAPM runs inside your own infrastructure, ScyllaDB metrics never leave your cloud. This matters for clusters handling sensitive data where telemetry egress to a third-party SaaS is not acceptable.
What CubeAPM monitors for ScyllaDB:
- Coordinator read/write/range latency (p50, p95, p99) per node and per shard
- Compaction pending tasks, active tasks, failed tasks, and normalized backlog
- Reactor utilization per shard with per-core drill-down
- Cache hit ratios (scylla_cache_hits, scylla_cache_misses)
- CQL operation rates (scylla_cql_reads, scylla_cql_updates)
- Node-level metrics: memory usage, disk utilization, network throughput
- Log-based tombstone warning detection
Key alerts to configure for ScyllaDB in CubeAPM:
| Alert | Condition | Severity |
| High p99 read latency | coordinator_read_latency p99 > 10,000 µs | Warning |
| High p99 write latency | coordinator_write_latency p99 > 5,000 µs | Warning |
| Shard overload | reactor_utilization > 0.9 on any shard | Critical |
| Compaction backlog growing | normalized_backlog > 0.5 sustained > 5 min | Warning |
| Compaction failures | rate(failed_compactions[5m]) > 0 | Critical |
| High pending compactions | pending_compactions > 200 | Warning |
| Tombstone abort errors | Log pattern match on query_tombstone_page_limit | Critical |
Read the docs at https://docs.cubeapm.com/ to set up Prometheus metric collection and infrastructure monitoring for ScyllaDB.
Summary
ScyllaDB’s shard-per-core design means that node-level metrics alone are not enough. A single overloaded shard or a stalled compaction worker can degrade tail latency while everything else looks normal. The table below maps each failure mode to the right monitoring source.
| Monitoring area | Primary source | Key signal |
| Coordinator latency | scylla_storage_proxy_coordinator_*_latency | p99 read/write latency per instance and shard |
| Real-time latency check | nodetool proxyhistograms | p95 and p99 percentiles across all operations |
| Per-table latency | nodetool cfhistograms, nodetool tablestats | Read/write latency, SSTable count, tombstone count |
| Compaction health | scylla_compaction_manager_* | Pending tasks, failed tasks, normalized backlog |
| Reactor utilization | scylla_reactor_utilization | Per-shard busy fraction (alert above 0.8) |
| Tombstone pressure | nodetool tablestats, ScyllaDB logs | Live tombstone ratio, query_tombstone_page_limit abort errors |
Disclaimer: All metric names are sourced from the official ScyllaDB metrics reference at docs.scylladb.com/manual/stable/reference/metrics.html, verified against ScyllaDB 2026.1 (latest patch: 2026.1.4, released May 2026). tombstone_warn_threshold defaults to 0 (disabled) in current ScyllaDB. query_tombstone_page_limit defaults to 10,000. compaction_static_shares valid range is 50 to 1000. Prometheus query syntax follows Prometheus 2.x. Adjust all latency and utilization thresholds to match your own SLOs before deploying alerts.
Also read:
How to Monitor TimescaleDB for Slow Queries and Chunk Health
What Are the Best New Relic Alternatives with OpenTelemetry Support?
What are the Best New Relic Alternatives for Java Applications?





