CubeAPM
CubeAPM CubeAPM

How to Monitor ScyllaDB Latency and Compaction Metrics

How to Monitor ScyllaDB Latency and Compaction Metrics

Table of Contents

ScyllaDB’s shard-per-core architecture and C++ runtime eliminate the garbage collection pauses and JVM overhead that cause latency spikes in Cassandra. But ScyllaDB has its own failure modes, and they are not always loud. Compaction can quietly saturate disk I/O while reads slow down gradually. Tombstone accumulation from frequent deletes increases read amplification without triggering any error. A single overloaded shard can spike tail latency on a node that looks healthy from the outside. Background workers can exhaust their scheduling group while your median latency stays flat.

This guide covers the monitoring areas that matter most for ScyllaDB in production: coordinator and replica-level latency, compaction health, tombstone pressure, reactor utilization, and setting up alerts so these problems surface before they become incidents.

What Makes ScyllaDB Monitoring Different from Cassandra

ScyllaDB introduces three monitoring concepts that have no direct equivalent in Cassandra.

  • Shard-per-core architecture: ScyllaDB assigns each CPU core its own memory, network queues, and storage I/O. A query is routed to a specific shard and processed without cross-core locking. This means a single overloaded shard can spike latency on a node even when aggregate CPU utilization looks normal. Monitoring must track per-shard reactor utilization, not just node-level CPU.
  • Two latency surfaces: ScyllaDB exposes latency at two layers. Coordinator latency (scylla_storage_proxy_coordinator_*) reflects the total time from when the client request arrives at the coordinator node to when the response is sent back, including replication. Replica latency (scylla_storage_proxy_replica_*) reflects the time spent on each replica processing the request. Both must be monitored to distinguish coordinator-side bottlenecks from replica-side ones.
  • Compaction backlog as a first-class metric: ScyllaDB exposes a compaction backlog metric that normalizes pending compaction work against available shard memory. A growing normalized backlog is an early warning of write amplification before disk saturation occurs. Cassandra exposes pending tasks but not a normalized backlog signal.

Step 1: Verify ScyllaDB’s Prometheus Metrics Endpoint

ScyllaDB exposes all metrics in Prometheus format on port 9180 by default. Verify it is reachable:

curl http://<scylla-node-ip>:9180/metrics | head -50

You should see metric families with descriptions. The full metrics reference is available at http://<scylla-node-ip>:9180/metrics on any running node and in the official ScyllaDB documentation at docs.scylladb.com/manual/stable/reference/metrics.html.

ScyllaDB also ships a pre-built monitoring stack (Prometheus + Grafana) via the ScyllaDB Monitoring Stack project. The Prometheus queries in this guide work with both the official monitoring stack and any external Prometheus-compatible backend, including CubeAPM.

Step 2: Monitor Coordinator Latency

Coordinator latency is the primary latency signal your application experiences. It covers the full round trip from request arrival at the coordinator to response sent to the client, including replication across all configured replicas.

Key Prometheus metrics:

MetricTypeDescription
scylla_storage_proxy_coordinator_read_latencyhistogramRead request latency at the coordinator, in microseconds
scylla_storage_proxy_coordinator_write_latencyhistogramWrite request latency at the coordinator, in microseconds
scylla_storage_proxy_coordinator_range_latencyhistogramRange scan latency at the coordinator, in microseconds

To compute p99 read latency across all shards on a node (PromQL):

histogram_quantile(

  0.99,

  sum(rate(scylla_storage_proxy_coordinator_read_latency_bucket[60s]))

  by (instance, le)

)

To compute p99 write latency:

histogram_quantile(

  0.99,

  sum(rate(scylla_storage_proxy_coordinator_write_latency_bucket[60s]))

  by (instance, le)

)

To break down latency by shard (useful when a single shard is causing tail latency):

histogram_quantile(

  0.99,

  sum(rate(scylla_storage_proxy_coordinator_write_latency_bucket[60s]))

  by (instance, shard, le)

)

Latency thresholds to alert on (all values in microseconds, adjust to your SLO):

PercentileWarningCritical
p50 read latency> 1,000 µs> 5,000 µs
p99 read latency> 10,000 µs> 50,000 µs
p50 write latency> 500 µs> 2,000 µs
p99 write latency> 5,000 µs> 20,000 µs

Step 3: Check Coordinator Latency from the Command Line

nodetool proxyhistograms provides coordinator latency percentiles in real time directly from the command line, without requiring Prometheus:

nodetool proxyhistograms

Example output:

proxy histograms

Percentile  Read Latency   Write Latency  Range Latency

            (micros)       (micros)       (micros)

50%         353.50         1214.50        4103.00

75%         972.50         2969.25        5073.25

95%         4832.85        15394.80       14981.50

99%         8356.63        21873.00       31843.79

Min         22.00          207.00         499.00

Max         8365.00        21873.00       208960.00

This is the fastest way to spot a latency regression during an active incident. If p99 read latency has spiked, follow up with per-table diagnostics using nodetool cfhistograms.

Step 4: Check Per-Table Latency

nodetool cfhistograms (also callable as nodetool tablehistograms) provides per-table latency histograms including read/write latency, SSTable count, partition size, and column count, covering all operations since the last time the command was run:

nodetool cfhistograms <keyspace> <table_name>

nodetool tablestats (previously nodetool cfstats) provides a summary of read and write latency averages, SSTable count, bloom filter false positive ratio, and tombstone counts per table:

nodetool tablestats <keyspace>/<table_name>

Key fields to read in tablestats output:

FieldWhat to watch for
Read LatencyAverage read latency in ms; rising average with low p50 suggests outliers
SSTable countHigh count (> 20 per level for LCS, > 4 for STCS) indicates compaction is not keeping up
Tombstone fieldsHigh live tombstone counts indicate accumulation that will slow reads
Bloom filter false positivesHigh false positive rate means extra disk reads on every query

Step 5: Monitor Compaction Health

Compaction in ScyllaDB merges SSTables and reclaims space from deleted data. When compaction falls behind, SSTable count grows, read amplification increases, and latency climbs. ScyllaDB exposes compaction health via Prometheus metrics and the nodetool compactionstats command.

Key Prometheus compaction metrics:

MetricTypeDescription
scylla_compaction_manager_compactionsgaugeCurrently active compactions
scylla_compaction_manager_pending_compactionsgaugeCompaction tasks waiting to run
scylla_compaction_manager_completed_compactionscounterTotal completed compaction tasks
scylla_compaction_manager_failed_compactionscounterTotal failed compaction tasks
scylla_compaction_manager_postponed_compactionsgaugeTables with postponed compaction
scylla_compaction_manager_backloggaugeSum of compaction backlog across all tables
scylla_compaction_manager_normalized_backloggaugeBacklog normalized by shard available memory

The normalized_backlog metric is the most important early warning signal. It divides compaction backlog by the shard’s available memory, producing a unitless ratio. When it exceeds 1.0 across multiple shards, compaction is under sustained pressure.

To check active compactions from the command line:

nodetool compactionstats

To view compaction history:

nodetool compactionhistory

Compaction alert thresholds:

SignalWarningCritical
pending_compactions> 100> 500
normalized_backlog> 0.5> 1.0
failed_compactions rateAny increaseRapid increase
postponed_compactions> 5 tables> 20 tables

When compaction is consuming too much CPU and impacting read latency, throttle it using compaction_static_shares in scylla.yaml. A value of 100 is a conservative starting point:

compaction_static_shares: 100

Increase the value if read latency worsens due to compaction being too slow, decrease it if compaction is saturating the CPU. The valid range is 50 to 1000.

Step 6: Monitor Reactor Utilization Per Shard

Reactor utilization measures how busy each Seastar reactor (one per CPU core/shard) is. A reactor at 100% utilization cannot process new requests on that shard without queuing them, which causes latency spikes even when aggregate CPU looks normal.

Key Prometheus metric:

MetricTypeDescription
scylla_reactor_utilizationgaugeFraction of time the reactor was busy (0.0 to 1.0), per shard

To alert on reactor utilization above 80% on any shard (PromQL):

scylla_reactor_utilization > 0.8

To find the most loaded shard across the cluster:

topk(5, scylla_reactor_utilization)

Reactor utilization thresholds:

UtilizationStatus
< 0.5Healthy
0.5 to 0.8Monitor for upward trend
> 0.8Warning: latency tail risk
> 0.95Critical: shard likely queueing requests

A persistently overloaded shard is usually caused by a hot partition routing disproportionate traffic to one shard. Use nodetool toppartitions to identify which partitions are generating the most reads and writes:

nodetool toppartitions <keyspace> <table> <duration_ms>

Step 7: Monitor Tombstone Pressure

Tombstones are markers ScyllaDB writes to record deletions. They accumulate until compaction removes them. A high tombstone count on a partition increases read amplification because every read must scan and filter tombstones before returning live data.

ScyllaDB uses two separate configuration parameters to control tombstone behavior. tombstone_warn_threshold controls when a warning is logged during a read (default is 0, meaning warnings are disabled). query_tombstone_page_limit sets the hard limit at which ScyllaDB aborts the query, defaulting to 10,000 tombstones per page. When this limit is breached, ScyllaDB throws the error: Tombstones processed by unpaged query exceeds limit of 10000 (configured via query_tombstone_page_limit).

To enable tombstone warnings before the abort threshold is reached, set tombstone_warn_threshold in scylla.yaml:

tombstone_warn_threshold: 1000

query_tombstone_page_limit: 10000

Monitor tombstone counts per table via nodetool tablestats:

nodetool tablestats <keyspace>/<table_name>

Look for the Tombstone section in the output. A high live tombstone count with slow growth means compaction is keeping up. Rapidly growing tombstone counts alongside a rising SSTable count means compaction is behind and deletions are accumulating faster than they are being reclaimed.

Tombstone alert thresholds:

SignalWarningCritical
Live tombstone ratio (tombstones / total cells)> 10%> 30%
tombstone_warn_threshold breaches in ScyllaDB logsAny occurrenceRepeated occurrence
query_tombstone_page_limit abort errors in logsAny occurrenceInvestigate immediately
SSTable count growth with rising tombstonesMonitor trendInvestigate immediately

Step 8: Set Up Alerts with CubeAPM

cubeapm

Running these checks manually is a start, but production ScyllaDB clusters need automated alerting that fires before latency degrades or compaction spirals. CubeAPM connects to ScyllaDB’s Prometheus endpoint on port 9180, scrapes all metrics above at configurable intervals, and lets you define alert thresholds with per-shard granularity across every node in the cluster.

Because CubeAPM runs inside your own infrastructure, ScyllaDB metrics never leave your cloud. This matters for clusters handling sensitive data where telemetry egress to a third-party SaaS is not acceptable.

What CubeAPM monitors for ScyllaDB:

  • Coordinator read/write/range latency (p50, p95, p99) per node and per shard
  • Compaction pending tasks, active tasks, failed tasks, and normalized backlog
  • Reactor utilization per shard with per-core drill-down
  • Cache hit ratios (scylla_cache_hits, scylla_cache_misses)
  • CQL operation rates (scylla_cql_reads, scylla_cql_updates)
  • Node-level metrics: memory usage, disk utilization, network throughput
  • Log-based tombstone warning detection

Key alerts to configure for ScyllaDB in CubeAPM:

AlertConditionSeverity
High p99 read latencycoordinator_read_latency p99 > 10,000 µsWarning
High p99 write latencycoordinator_write_latency p99 > 5,000 µsWarning
Shard overloadreactor_utilization > 0.9 on any shardCritical
Compaction backlog growingnormalized_backlog > 0.5 sustained > 5 minWarning
Compaction failuresrate(failed_compactions[5m]) > 0Critical
High pending compactionspending_compactions > 200Warning
Tombstone abort errorsLog pattern match on query_tombstone_page_limitCritical

Read the docs at https://docs.cubeapm.com/ to set up Prometheus metric collection and infrastructure monitoring for ScyllaDB.

Summary

ScyllaDB’s shard-per-core design means that node-level metrics alone are not enough. A single overloaded shard or a stalled compaction worker can degrade tail latency while everything else looks normal. The table below maps each failure mode to the right monitoring source.

Monitoring areaPrimary sourceKey signal
Coordinator latencyscylla_storage_proxy_coordinator_*_latencyp99 read/write latency per instance and shard
Real-time latency checknodetool proxyhistogramsp95 and p99 percentiles across all operations
Per-table latencynodetool cfhistograms, nodetool tablestatsRead/write latency, SSTable count, tombstone count
Compaction healthscylla_compaction_manager_*Pending tasks, failed tasks, normalized backlog
Reactor utilizationscylla_reactor_utilizationPer-shard busy fraction (alert above 0.8)
Tombstone pressurenodetool tablestats, ScyllaDB logsLive tombstone ratio, query_tombstone_page_limit abort errors

Disclaimer: All metric names are sourced from the official ScyllaDB metrics reference at docs.scylladb.com/manual/stable/reference/metrics.html, verified against ScyllaDB 2026.1 (latest patch: 2026.1.4, released May 2026). tombstone_warn_threshold defaults to 0 (disabled) in current ScyllaDB. query_tombstone_page_limit defaults to 10,000. compaction_static_shares valid range is 50 to 1000. Prometheus query syntax follows Prometheus 2.x. Adjust all latency and utilization thresholds to match your own SLOs before deploying alerts.

Also read:

How to Monitor TimescaleDB for Slow Queries and Chunk Health

What Are the Best New Relic Alternatives with OpenTelemetry Support?

What are the Best New Relic Alternatives for Java Applications?

×
×