How to Monitor TimescaleDB for Slow Queries and Chunk Health

Standard PostgreSQL monitoring catches none of this. Compression policies stall without errors. Chunk counts balloon without alerts. Background jobs exhaust their worker pool and start queueing up, again silently.

This guide covers the six areas that matter most when you monitor TimescaleDB in production: slow query detection, chunk health, compression job status, continuous aggregate freshness, background worker health, and setting up alerts so these problems surface before they become incidents.

What Makes TimescaleDB Monitoring Different from PostgreSQL

TimescaleDB adds three layers of complexity that standard PostgreSQL monitoring does not cover.

Hypertables and chunks: Every hypertable is physically partitioned into chunks, one per time interval. Too many chunks cause query planning overhead. A hypertable with 4,000 chunks can add 400ms or more of planning time to a query that executes in 2ms, because the planner must evaluate constraints for every chunk on every query. Standard PostgreSQL monitoring reports the slow query, but not its cause.
Compression jobs: TimescaleDB compresses older chunks automatically via background jobs. When a compression job fails silently, uncompressed chunks accumulate. Three months of uncompressed data can represent a 10x to 20x disk overage relative to what compression would have produced, without triggering a single log error visible to standard monitoring.
Continuous aggregates: Continuous aggregates are materialized views that refresh automatically as new data arrives. When the refresh job stalls, the aggregate serves stale data with no error. Dashboards built on the aggregate show outdated values, and no alert fires unless you are explicitly monitoring the refresh lag.

Step 1: Enable pg_stat_statements for Slow Query Detection

pg_stat_statements is the foundation of slow query monitoring in TimescaleDB. It is a PostgreSQL contrib extension that aggregates execution statistics for every query run against the database. TimescaleDB inherits it fully.

Add pg_stat_statements to shared_preload_libraries in postgresql.conf:

shared_preload_libraries = 'timescaledb,pg_stat_statements'

pg_stat_statements.track = all

pg_stat_statements.max = 10000

track_io_timing = on

Restart PostgreSQL, then enable the extension:

sql

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

shared_preload_libraries = 'timescaledb,pg_stat_statements'

pg_stat_statements.track = all

pg_stat_statements.max = 10000

track_io_timing = on

Restart PostgreSQL, then enable the extension:

sql

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Step 2: Find Slow Queries

The following query returns the top 10 slowest queries by mean execution time. High mean time with many calls is a tuning priority.

SELECT

  queryid,

  calls,

  ROUND(total_exec_time::numeric / 1000, 2)  AS total_exec_sec,

  ROUND(mean_exec_time::numeric / 1000, 4)   AS mean_exec_sec,

  ROUND(stddev_exec_time::numeric / 1000, 4) AS stddev_exec_sec,

  rows,

  shared_blks_hit,

  shared_blks_read,

  LEFT(query, 120)                            AS query_preview

FROM pg_stat_statements

WHERE calls > 10

ORDER BY mean_exec_time DESC

LIMIT 10;

Column reference (PostgreSQL 13+):

SELECT

  queryid,

  calls,

  ROUND(total_exec_time::numeric / 1000, 2)  AS total_exec_sec,

  ROUND(mean_exec_time::numeric / 1000, 4)   AS mean_exec_sec,

  ROUND(stddev_exec_time::numeric / 1000, 4) AS stddev_exec_sec,

  rows,

  shared_blks_hit,

  shared_blks_read,

  LEFT(query, 120)                            AS query_preview

FROM pg_stat_statements

WHERE calls > 10

ORDER BY mean_exec_time DESC

LIMIT 10;

Column reference (PostgreSQL 13+):

Column	Description
total_exec_time	Total time spent executing, in milliseconds
mean_exec_time	Average execution time per call, in milliseconds
stddev_exec_time	Standard deviation of execution time (high = inconsistent)
shared_blks_hit	Blocks served from cache
shared_blks_read	Blocks read from disk (high = cache misses)
calls	Number of times the query was executed

To find queries with high variance (occasionally very slow but usually fast), sort by stddev_exec_time:

SELECT

  calls,

  ROUND(mean_exec_time::numeric / 1000, 4)   AS mean_exec_sec,

  ROUND(stddev_exec_time::numeric / 1000, 4) AS stddev_exec_sec,

  LEFT(query, 120) AS query_preview

FROM pg_stat_statements

WHERE calls > 10

  AND stddev_exec_time > mean_exec_time

ORDER BY stddev_exec_time DESC

LIMIT 10;

To reset statistics after making changes and starting a clean baseline:

sql

SELECT pg_stat_statements_reset();

SELECT

  calls,

  ROUND(mean_exec_time::numeric / 1000, 4)   AS mean_exec_sec,

  ROUND(stddev_exec_time::numeric / 1000, 4) AS stddev_exec_sec,

  LEFT(query, 120) AS query_preview

FROM pg_stat_statements

WHERE calls > 10

  AND stddev_exec_time > mean_exec_time

ORDER BY stddev_exec_time DESC

LIMIT 10;

To reset statistics after making changes and starting a clean baseline:

sql

SELECT pg_stat_statements_reset();

Step 3: Check Active Slow Queries in Real Time

pg_stat_statements tracks historical aggregates. To catch queries that are slow right now, use pg_stat_activity:

SELECT

  pid,

  now() - query_start  AS duration,

  state,

  wait_event_type,

  wait_event,

  LEFT(query, 200)      AS query_preview

FROM pg_stat_activity

WHERE state = 'active'

  AND query_start < now() - INTERVAL '5 seconds'

ORDER BY duration DESC;

SELECT

  pid,

  now() - query_start  AS duration,

  state,

  wait_event_type,

  wait_event,

  LEFT(query, 200)      AS query_preview

FROM pg_stat_activity

WHERE state = 'active'

  AND query_start < now() - INTERVAL '5 seconds'

ORDER BY duration DESC;

This surfaces queries that have been running for more than 5 seconds. Adjust the interval threshold to match your SLO. Common causes in TimescaleDB: chunk decompression mid-query, missing index on a large chunk, or a continuous aggregate refresh blocking a read.

Step 4: Monitor Chunk Health

Chunk count and size are the most important structural health signals for a TimescaleDB hypertable. Run this query to get an overview per hypertable:

SELECT

  hypertable_schema,

  hypertable_name,

  COUNT(*)                                    AS chunk_count,

  pg_size_pretty(SUM(total_bytes))            AS total_size,

  pg_size_pretty(AVG(total_bytes)::bigint)    AS avg_chunk_size,

  COUNT(*) FILTER (WHERE is_compressed)       AS compressed_chunks,

  COUNT(*) FILTER (WHERE NOT is_compressed)   AS uncompressed_chunks

FROM timescaledb_information.chunks

GROUP BY hypertable_schema, hypertable_name

ORDER BY SUM(total_bytes) DESC;

SELECT

  hypertable_schema,

  hypertable_name,

  COUNT(*)                                    AS chunk_count,

  pg_size_pretty(SUM(total_bytes))            AS total_size,

  pg_size_pretty(AVG(total_bytes)::bigint)    AS avg_chunk_size,

  COUNT(*) FILTER (WHERE is_compressed)       AS compressed_chunks,

  COUNT(*) FILTER (WHERE NOT is_compressed)   AS uncompressed_chunks

FROM timescaledb_information.chunks

GROUP BY hypertable_schema, hypertable_name

ORDER BY SUM(total_bytes) DESC;

Thresholds to alert on:

Signal	Warning	Critical
Total chunk count	> 500	> 1,000
Uncompressed chunks older than compression policy	> 10	> 50
Average chunk size	< 1 MB or > 5 GB	Depends on interval

High chunk count is a common cause of slow queries even when indexes are correct. If your total chunk count is above 1,000, consider increasing the chunk time interval using set_chunk_time_interval().

To inspect individual chunks and spot oversized or undersized ones:

SELECT

  chunk_schema,

  chunk_name,

  range_start,

  range_end,

  is_compressed,

  pg_size_pretty(total_bytes)   AS total_size,

  pg_size_pretty(heap_bytes)    AS heap_size,

  pg_size_pretty(index_bytes)   AS index_size

FROM timescaledb_information.chunks

WHERE hypertable_name = 'your_hypertable'

ORDER BY range_start DESC

LIMIT 20;

SELECT

  chunk_schema,

  chunk_name,

  range_start,

  range_end,

  is_compressed,

  pg_size_pretty(total_bytes)   AS total_size,

  pg_size_pretty(heap_bytes)    AS heap_size,

  pg_size_pretty(index_bytes)   AS index_size

FROM timescaledb_information.chunks

WHERE hypertable_name = 'your_hypertable'

ORDER BY range_start DESC

LIMIT 20;

Step 5: Monitor Compression Job Health

A failed compression job leaves chunks uncompressed. The failure is silent unless you query the job stats directly.

SELECT

  job_id,

  application_name,

  scheduled,

  last_run_status,

  last_run_started_at,

  next_start,

  total_failures,

  total_successes

FROM timescaledb_information.jobs

WHERE application_name LIKE 'Compression%'

ORDER BY total_failures DESC;

SELECT

  job_id,

  application_name,

  scheduled,

  last_run_status,

  last_run_started_at,

  next_start,

  total_failures,

  total_successes

FROM timescaledb_information.jobs

WHERE application_name LIKE 'Compression%'

ORDER BY total_failures DESC;

Check compression ratios for a specific hypertable:

SELECT

  chunk_name,

  pg_size_pretty(before_compression_total_bytes) AS before,

  pg_size_pretty(after_compression_total_bytes)  AS after,

  ROUND(

    before_compression_total_bytes::numeric /

    NULLIF(after_compression_total_bytes, 0), 1

  ) AS compression_ratio

FROM chunk_compression_stats('your_hypertable')

ORDER BY chunk_name DESC

LIMIT 20;

What to alert on:

SELECT

  chunk_name,

  pg_size_pretty(before_compression_total_bytes) AS before,

  pg_size_pretty(after_compression_total_bytes)  AS after,

  ROUND(

    before_compression_total_bytes::numeric /

    NULLIF(after_compression_total_bytes, 0), 1

  ) AS compression_ratio

FROM chunk_compression_stats('your_hypertable')

ORDER BY chunk_name DESC

LIMIT 20;

What to alert on:

Signal	Action
last_run_status = ‘Failed’	Alert immediately
total_failures increasing	Investigate compression settings
compression_ratio < 5x	Review segmentby and orderby column config
next_start overdue by > 2x schedule interval	Job may be stalled or worker pool exhausted

When a compression job fails, temporarily pause it while you investigate:

SELECT alter_job(<job_id>, scheduled => false);

Re-enable once resolved:

SELECT alter_job(<job_id>, scheduled => true);

SELECT alter_job(<job_id>, scheduled => false);

Re-enable once resolved:

SELECT alter_job(<job_id>, scheduled => true);

Step 6: Monitor Continuous Aggregate Freshness

A stale continuous aggregate silently serves outdated data. Query timescaledb_information.continuous_aggregate_stats directly, which contains both materialization progress and job status in one place:

SELECT

  view_name,

  completed_threshold,

  invalidation_threshold,

  last_run_started_at,

  last_successful_finish,

  last_run_status,

  next_scheduled_run,

  total_runs,

  total_failures

FROM timescaledb_information.continuous_aggregate_stats

ORDER BY view_name;

SELECT

  view_name,

  completed_threshold,

  invalidation_threshold,

  last_run_started_at,

  last_successful_finish,

  last_run_status,

  next_scheduled_run,

  total_runs,

  total_failures

FROM timescaledb_information.continuous_aggregate_stats

ORDER BY view_name;

Calculate refresh lag explicitly:

SELECT

  view_name,

  completed_threshold,

  now() - completed_threshold::timestamptz AS refresh_lag

FROM timescaledb_information.continuous_aggregate_stats

ORDER BY refresh_lag DESC;

SELECT

  view_name,

  completed_threshold,

  now() - completed_threshold::timestamptz AS refresh_lag

FROM timescaledb_information.continuous_aggregate_stats

ORDER BY refresh_lag DESC;

Alert thresholds:

Refresh interval	Warning lag	Critical lag
1 hour	> 2 hours	> 6 hours
1 day	> 2 days	> 4 days
Custom	> 2x schedule interval	> 4x schedule interval

Step 7: Monitor Background Worker Health

TimescaleDB runs all background jobs (compression, retention, continuous aggregate refresh) using a fixed pool of background workers. If the pool is exhausted, jobs queue up, and nothing runs. Check the pool size:

SHOW timescaledb.max_background_workers;

SHOW timescaledb.max_background_workers;

Check all job statuses at once:

SELECT

  job_id,

  application_name,

  schedule_interval,

  last_run_status,

  last_run_started_at,

  next_start,

  total_runs,

  total_failures,

  total_successes

FROM timescaledb_information.jobs

ORDER BY total_failures DESC, last_run_started_at DESC;

SELECT

  job_id,

  application_name,

  schedule_interval,

  last_run_status,

  last_run_started_at,

  next_start,

  total_runs,

  total_failures,

  total_successes

FROM timescaledb_information.jobs

ORDER BY total_failures DESC, last_run_started_at DESC;

If total_failures is incrementing across multiple jobs simultaneously, the worker pool is likely exhausted. Increase timescaledb.max_background_workers in postgresql.conf and reload:

timescaledb.max_background_workers = 16

timescaledb.max_background_workers = 16

Step 8: Set Up Alerts with CubeAPM

Running these queries manually is a start, but production monitoring requires automated alerts that fire before problems compound. CubeAPM connects to your TimescaleDB instance via its infrastructure monitoring agent, collects PostgreSQL and TimescaleDB metrics, and lets you define alert thresholds across all the signals above.

What CubeAPM monitors for TimescaleDB:

Query execution time from pg_stat_statements (mean, max, p95 execution time per query)
Active query duration from pg_stat_activity
Per-hypertable chunk count and total size
Compression job success/failure rate and last run status
Continuous aggregate refresh lag
Background worker job failure counts
PostgreSQL-level metrics: connections, cache hit ratio, replication lag, WAL write rate

Key alerts to configure for TimescaleDB:

Alert	Condition	Severity
Slow query	Mean execution time > 5s for any query with calls > 50/min	Warning
Long-running query	Active query duration > 30s	Critical
Chunk count spike	Chunk count for any hypertable > 1,000	Warning
Compression job failure	Any compression job last_run_status = ‘Failed’	Critical
Stale continuous aggregate	Refresh lag > 2x schedule interval	Warning
Background worker failure	Any job total_failures incrementing	Warning
Low compression ratio	Compression ratio drops below 5x	Warning

Read the docs at https://docs.cubeapm.com/ to set up PostgreSQL and infrastructure monitoring.

Summary

TimescaleDB’s failure modes are observable well before they cause incidents, but only if you are watching the right signals. Standard PostgreSQL monitoring misses compression job stalls, chunk count bloat, and continuous aggregate staleness entirely.

Monitoring area	Primary source	Key signal
Slow queries (historical)	pg_stat_statements	mean_exec_time, stddev_exec_time
Slow queries (real-time)	pg_stat_activity	Query duration > threshold
Chunk health	timescaledb_information.chunks	Chunk count, uncompressed chunk count
Compression jobs	timescaledb_information.jobs	last_run_status, total_failures
Continuous aggregates	timescaledb_information.continuous_aggregate_stats	completed_threshold, refresh_lag
Background workers	timescaledb_information.jobs	Simultaneous failures across job types

Disclaimer: All SQL in this article uses column names valid for PostgreSQL 13 and later. The total_exec_time and mean_exec_time columns in pg_stat_statements replaced total_time and mean_time in PostgreSQL 13. Current TimescaleDB (2.26 as of March 2026) requires PostgreSQL 15 or 16+. PostgreSQL 15 support is scheduled to be dropped in the next TimescaleDB release (June 2026), so PostgreSQL 16 or higher is strongly recommended for new deployments.

Also read:

What Are the Best New Relic Alternatives with OpenTelemetry Support?

What are the Best New Relic Alternatives for Java Applications?

What Are the Best New Relic Alternatives for Microservices?