How to Monitor ScyllaDB Latency and Compaction Metrics

ScyllaDB’s shard-per-core architecture and C++ runtime eliminate the garbage collection pauses and JVM overhead that cause latency spikes in Cassandra. But ScyllaDB has its own failure modes, and they are not always loud. Compaction can quietly saturate disk I/O while reads slow down gradually. Tombstone accumulation from frequent deletes increases read amplification without triggering any error. A single overloaded shard can spike tail latency on a node that looks healthy from the outside. Background workers can exhaust their scheduling group while your median latency stays flat.

This guide covers the monitoring areas that matter most for ScyllaDB in production: coordinator and replica-level latency, compaction health, tombstone pressure, reactor utilization, and setting up alerts so these problems surface before they become incidents.

What Makes ScyllaDB Monitoring Different from Cassandra

ScyllaDB introduces three monitoring concepts that have no direct equivalent in Cassandra.

Shard-per-core architecture: ScyllaDB assigns each CPU core its own memory, network queues, and storage I/O. A query is routed to a specific shard and processed without cross-core locking. This means a single overloaded shard can spike latency on a node even when aggregate CPU utilization looks normal. Monitoring must track per-shard reactor utilization, not just node-level CPU.
Two latency surfaces: ScyllaDB exposes latency at two layers. Coordinator latency (scylla_storage_proxy_coordinator_*) reflects the total time from when the client request arrives at the coordinator node to when the response is sent back, including replication. Replica latency (scylla_storage_proxy_replica_*) reflects the time spent on each replica processing the request. Both must be monitored to distinguish coordinator-side bottlenecks from replica-side ones.
Compaction backlog as a first-class metric: ScyllaDB exposes a compaction backlog metric that normalizes pending compaction work against available shard memory. A growing normalized backlog is an early warning of write amplification before disk saturation occurs. Cassandra exposes pending tasks but not a normalized backlog signal.

Step 1: Verify ScyllaDB’s Prometheus Metrics Endpoint

ScyllaDB exposes all metrics in Prometheus format on port 9180 by default. Verify it is reachable:

curl http://<scylla-node-ip>:9180/metrics | head -50

curl http://<scylla-node-ip>:9180/metrics | head -50

You should see metric families with descriptions. The full metrics reference is available at http://<scylla-node-ip>:9180/metrics on any running node and in the official ScyllaDB documentation at docs.scylladb.com/manual/stable/reference/metrics.html.

ScyllaDB also ships a pre-built monitoring stack (Prometheus + Grafana) via the ScyllaDB Monitoring Stack project. The Prometheus queries in this guide work with both the official monitoring stack and any external Prometheus-compatible backend, including CubeAPM.

Step 2: Monitor Coordinator Latency

Coordinator latency is the primary latency signal your application experiences. It covers the full round trip from request arrival at the coordinator to response sent to the client, including replication across all configured replicas.

Key Prometheus metrics:

Metric	Type	Description
scylla_storage_proxy_coordinator_read_latency	histogram	Read request latency at the coordinator, in microseconds
scylla_storage_proxy_coordinator_write_latency	histogram	Write request latency at the coordinator, in microseconds
scylla_storage_proxy_coordinator_range_latency	histogram	Range scan latency at the coordinator, in microseconds

To compute p99 read latency across all shards on a node (PromQL):

histogram_quantile(

  0.99,

  sum(rate(scylla_storage_proxy_coordinator_read_latency_bucket[60s]))

  by (instance, le)

)

histogram_quantile(

  0.99,

  sum(rate(scylla_storage_proxy_coordinator_read_latency_bucket[60s]))

  by (instance, le)

)

To compute p99 write latency:

histogram_quantile(

  0.99,

  sum(rate(scylla_storage_proxy_coordinator_write_latency_bucket[60s]))

  by (instance, le)

)

histogram_quantile(

  0.99,

  sum(rate(scylla_storage_proxy_coordinator_write_latency_bucket[60s]))

  by (instance, le)

)

To break down latency by shard (useful when a single shard is causing tail latency):

histogram_quantile(

  0.99,

  sum(rate(scylla_storage_proxy_coordinator_write_latency_bucket[60s]))

  by (instance, shard, le)

)

histogram_quantile(

  0.99,

  sum(rate(scylla_storage_proxy_coordinator_write_latency_bucket[60s]))

  by (instance, shard, le)

)

Latency thresholds to alert on (all values in microseconds, adjust to your SLO):

Percentile	Warning	Critical
p50 read latency	> 1,000 µs	> 5,000 µs
p99 read latency	> 10,000 µs	> 50,000 µs
p50 write latency	> 500 µs	> 2,000 µs
p99 write latency	> 5,000 µs	> 20,000 µs

Step 3: Check Coordinator Latency from the Command Line

nodetool proxyhistograms provides coordinator latency percentiles in real time directly from the command line, without requiring Prometheus:

nodetool proxyhistograms

nodetool proxyhistograms

Example output:

proxy histograms

Percentile  Read Latency   Write Latency  Range Latency

            (micros)       (micros)       (micros)

50%         353.50         1214.50        4103.00

75%         972.50         2969.25        5073.25

95%         4832.85        15394.80       14981.50

99%         8356.63        21873.00       31843.79

Min         22.00          207.00         499.00

Max         8365.00        21873.00       208960.00

proxy histograms

Percentile  Read Latency   Write Latency  Range Latency

            (micros)       (micros)       (micros)

50%         353.50         1214.50        4103.00

75%         972.50         2969.25        5073.25

95%         4832.85        15394.80       14981.50

99%         8356.63        21873.00       31843.79

Min         22.00          207.00         499.00

Max         8365.00        21873.00       208960.00

This is the fastest way to spot a latency regression during an active incident. If p99 read latency has spiked, follow up with per-table diagnostics using nodetool cfhistograms.

Step 4: Check Per-Table Latency

nodetool cfhistograms (also callable as nodetool tablehistograms) provides per-table latency histograms including read/write latency, SSTable count, partition size, and column count, covering all operations since the last time the command was run:

nodetool cfhistograms <keyspace> <table_name>

nodetool cfhistograms <keyspace> <table_name>

nodetool tablestats (previously nodetool cfstats) provides a summary of read and write latency averages, SSTable count, bloom filter false positive ratio, and tombstone counts per table:

nodetool tablestats <keyspace>/<table_name>

nodetool tablestats <keyspace>/<table_name>

Key fields to read in tablestats output:

Field	What to watch for
Read Latency	Average read latency in ms; rising average with low p50 suggests outliers
SSTable count	High count (> 20 per level for LCS, > 4 for STCS) indicates compaction is not keeping up
Tombstone fields	High live tombstone counts indicate accumulation that will slow reads
Bloom filter false positives	High false positive rate means extra disk reads on every query

Step 5: Monitor Compaction Health

Compaction in ScyllaDB merges SSTables and reclaims space from deleted data. When compaction falls behind, SSTable count grows, read amplification increases, and latency climbs. ScyllaDB exposes compaction health via Prometheus metrics and the nodetool compactionstats command.

Key Prometheus compaction metrics:

Metric	Type	Description
scylla_compaction_manager_compactions	gauge	Currently active compactions
scylla_compaction_manager_pending_compactions	gauge	Compaction tasks waiting to run
scylla_compaction_manager_completed_compactions	counter	Total completed compaction tasks
scylla_compaction_manager_failed_compactions	counter	Total failed compaction tasks
scylla_compaction_manager_postponed_compactions	gauge	Tables with postponed compaction
scylla_compaction_manager_backlog	gauge	Sum of compaction backlog across all tables
scylla_compaction_manager_normalized_backlog	gauge	Backlog normalized by shard available memory

The normalized_backlog metric is the most important early warning signal. It divides compaction backlog by the shard’s available memory, producing a unitless ratio. When it exceeds 1.0 across multiple shards, compaction is under sustained pressure.

To check active compactions from the command line:

nodetool compactionstats

nodetool compactionstats

To view compaction history:

nodetool compactionhistory

nodetool compactionhistory

Compaction alert thresholds:

Signal	Warning	Critical
pending_compactions	> 100	> 500
normalized_backlog	> 0.5	> 1.0
failed_compactions rate	Any increase	Rapid increase
postponed_compactions	> 5 tables	> 20 tables

When compaction is consuming too much CPU and impacting read latency, throttle it using compaction_static_shares in scylla.yaml. A value of 100 is a conservative starting point:

compaction_static_shares: 100

compaction_static_shares: 100

Increase the value if read latency worsens due to compaction being too slow, decrease it if compaction is saturating the CPU. The valid range is 50 to 1000.

Step 6: Monitor Reactor Utilization Per Shard

Reactor utilization measures how busy each Seastar reactor (one per CPU core/shard) is. A reactor at 100% utilization cannot process new requests on that shard without queuing them, which causes latency spikes even when aggregate CPU looks normal.

Key Prometheus metric:

Metric	Type	Description
scylla_reactor_utilization	gauge	Fraction of time the reactor was busy (0.0 to 1.0), per shard

To alert on reactor utilization above 80% on any shard (PromQL):

scylla_reactor_utilization > 0.8

scylla_reactor_utilization > 0.8

To find the most loaded shard across the cluster:

topk(5, scylla_reactor_utilization)

topk(5, scylla_reactor_utilization)

Reactor utilization thresholds:

Utilization	Status
< 0.5	Healthy
0.5 to 0.8	Monitor for upward trend
> 0.8	Warning: latency tail risk
> 0.95	Critical: shard likely queueing requests

A persistently overloaded shard is usually caused by a hot partition routing disproportionate traffic to one shard. Use nodetool toppartitions to identify which partitions are generating the most reads and writes:

nodetool toppartitions <keyspace> <table> <duration_ms>

nodetool toppartitions <keyspace> <table> <duration_ms>

Step 7: Monitor Tombstone Pressure

Tombstones are markers ScyllaDB writes to record deletions. They accumulate until compaction removes them. A high tombstone count on a partition increases read amplification because every read must scan and filter tombstones before returning live data.

ScyllaDB uses two separate configuration parameters to control tombstone behavior. tombstone_warn_threshold controls when a warning is logged during a read (default is 0, meaning warnings are disabled). query_tombstone_page_limit sets the hard limit at which ScyllaDB aborts the query, defaulting to 10,000 tombstones per page. When this limit is breached, ScyllaDB throws the error: Tombstones processed by unpaged query exceeds limit of 10000 (configured via query_tombstone_page_limit).

To enable tombstone warnings before the abort threshold is reached, set tombstone_warn_threshold in scylla.yaml:

tombstone_warn_threshold: 1000

query_tombstone_page_limit: 10000

tombstone_warn_threshold: 1000

query_tombstone_page_limit: 10000

Monitor tombstone counts per table via nodetool tablestats:

nodetool tablestats <keyspace>/<table_name>

nodetool tablestats <keyspace>/<table_name>

Look for the Tombstone section in the output. A high live tombstone count with slow growth means compaction is keeping up. Rapidly growing tombstone counts alongside a rising SSTable count means compaction is behind and deletions are accumulating faster than they are being reclaimed.

Tombstone alert thresholds:

Signal	Warning	Critical
Live tombstone ratio (tombstones / total cells)	> 10%	> 30%
tombstone_warn_threshold breaches in ScyllaDB logs	Any occurrence	Repeated occurrence
query_tombstone_page_limit abort errors in logs	Any occurrence	Investigate immediately
SSTable count growth with rising tombstones	Monitor trend	Investigate immediately

Step 8: Set Up Alerts with CubeAPM

Running these checks manually is a start, but production ScyllaDB clusters need automated alerting that fires before latency degrades or compaction spirals. CubeAPM connects to ScyllaDB’s Prometheus endpoint on port 9180, scrapes all metrics above at configurable intervals, and lets you define alert thresholds with per-shard granularity across every node in the cluster.

Because CubeAPM runs inside your own infrastructure, ScyllaDB metrics never leave your cloud. This matters for clusters handling sensitive data where telemetry egress to a third-party SaaS is not acceptable.

What CubeAPM monitors for ScyllaDB:

Coordinator read/write/range latency (p50, p95, p99) per node and per shard
Compaction pending tasks, active tasks, failed tasks, and normalized backlog
Reactor utilization per shard with per-core drill-down
Cache hit ratios (scylla_cache_hits, scylla_cache_misses)
CQL operation rates (scylla_cql_reads, scylla_cql_updates)
Node-level metrics: memory usage, disk utilization, network throughput
Log-based tombstone warning detection

Key alerts to configure for ScyllaDB in CubeAPM:

Alert	Condition	Severity
High p99 read latency	coordinator_read_latency p99 > 10,000 µs	Warning
High p99 write latency	coordinator_write_latency p99 > 5,000 µs	Warning
Shard overload	reactor_utilization > 0.9 on any shard	Critical
Compaction backlog growing	normalized_backlog > 0.5 sustained > 5 min	Warning
Compaction failures	rate(failed_compactions[5m]) > 0	Critical
High pending compactions	pending_compactions > 200	Warning
Tombstone abort errors	Log pattern match on query_tombstone_page_limit	Critical

Read the docs at https://docs.cubeapm.com/ to set up Prometheus metric collection and infrastructure monitoring for ScyllaDB.

Summary

ScyllaDB’s shard-per-core design means that node-level metrics alone are not enough. A single overloaded shard or a stalled compaction worker can degrade tail latency while everything else looks normal. The table below maps each failure mode to the right monitoring source.

Monitoring area	Primary source	Key signal
Coordinator latency	scylla_storage_proxy_coordinator_*_latency	p99 read/write latency per instance and shard
Real-time latency check	nodetool proxyhistograms	p95 and p99 percentiles across all operations
Per-table latency	nodetool cfhistograms, nodetool tablestats	Read/write latency, SSTable count, tombstone count
Compaction health	scylla_compaction_manager_*	Pending tasks, failed tasks, normalized backlog
Reactor utilization	scylla_reactor_utilization	Per-shard busy fraction (alert above 0.8)
Tombstone pressure	nodetool tablestats, ScyllaDB logs	Live tombstone ratio, query_tombstone_page_limit abort errors

Disclaimer: All metric names are sourced from the official ScyllaDB metrics reference at docs.scylladb.com/manual/stable/reference/metrics.html, verified against ScyllaDB 2026.1 (latest patch: 2026.1.4, released May 2026). tombstone_warn_threshold defaults to 0 (disabled) in current ScyllaDB. query_tombstone_page_limit defaults to 10,000. compaction_static_shares valid range is 50 to 1000. Prometheus query syntax follows Prometheus 2.x. Adjust all latency and utilization thresholds to match your own SLOs before deploying alerts.

Also read:

How to Monitor TimescaleDB for Slow Queries and Chunk Health

What Are the Best New Relic Alternatives with OpenTelemetry Support?

What are the Best New Relic Alternatives for Java Applications?