Standard PostgreSQL monitoring catches none of this. Compression policies stall without errors. Chunk counts balloon without alerts. Background jobs exhaust their worker pool and start queueing up, again silently.
This guide covers the six areas that matter most when you monitor TimescaleDB in production: slow query detection, chunk health, compression job status, continuous aggregate freshness, background worker health, and setting up alerts so these problems surface before they become incidents.
What Makes TimescaleDB Monitoring Different from PostgreSQL
TimescaleDB adds three layers of complexity that standard PostgreSQL monitoring does not cover.
- Hypertables and chunks: Every hypertable is physically partitioned into chunks, one per time interval. Too many chunks cause query planning overhead. A hypertable with 4,000 chunks can add 400ms or more of planning time to a query that executes in 2ms, because the planner must evaluate constraints for every chunk on every query. Standard PostgreSQL monitoring reports the slow query, but not its cause.
- Compression jobs: TimescaleDB compresses older chunks automatically via background jobs. When a compression job fails silently, uncompressed chunks accumulate. Three months of uncompressed data can represent a 10x to 20x disk overage relative to what compression would have produced, without triggering a single log error visible to standard monitoring.
- Continuous aggregates: Continuous aggregates are materialized views that refresh automatically as new data arrives. When the refresh job stalls, the aggregate serves stale data with no error. Dashboards built on the aggregate show outdated values, and no alert fires unless you are explicitly monitoring the refresh lag.
Step 1: Enable pg_stat_statements for Slow Query Detection
pg_stat_statements is the foundation of slow query monitoring in TimescaleDB. It is a PostgreSQL contrib extension that aggregates execution statistics for every query run against the database. TimescaleDB inherits it fully.
Add pg_stat_statements to shared_preload_libraries in postgresql.conf:
shared_preload_libraries = 'timescaledb,pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000
track_io_timing = on
Restart PostgreSQL, then enable the extension:
sql
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;Step 2: Find Slow Queries
The following query returns the top 10 slowest queries by mean execution time. High mean time with many calls is a tuning priority.
SELECT
queryid,
calls,
ROUND(total_exec_time::numeric / 1000, 2) AS total_exec_sec,
ROUND(mean_exec_time::numeric / 1000, 4) AS mean_exec_sec,
ROUND(stddev_exec_time::numeric / 1000, 4) AS stddev_exec_sec,
rows,
shared_blks_hit,
shared_blks_read,
LEFT(query, 120) AS query_preview
FROM pg_stat_statements
WHERE calls > 10
ORDER BY mean_exec_time DESC
LIMIT 10;
Column reference (PostgreSQL 13+):| Column | Description |
| total_exec_time | Total time spent executing, in milliseconds |
| mean_exec_time | Average execution time per call, in milliseconds |
| stddev_exec_time | Standard deviation of execution time (high = inconsistent) |
| shared_blks_hit | Blocks served from cache |
| shared_blks_read | Blocks read from disk (high = cache misses) |
| calls | Number of times the query was executed |
To find queries with high variance (occasionally very slow but usually fast), sort by stddev_exec_time:
SELECT
calls,
ROUND(mean_exec_time::numeric / 1000, 4) AS mean_exec_sec,
ROUND(stddev_exec_time::numeric / 1000, 4) AS stddev_exec_sec,
LEFT(query, 120) AS query_preview
FROM pg_stat_statements
WHERE calls > 10
AND stddev_exec_time > mean_exec_time
ORDER BY stddev_exec_time DESC
LIMIT 10;
To reset statistics after making changes and starting a clean baseline:
sql
SELECT pg_stat_statements_reset();Step 3: Check Active Slow Queries in Real Time
pg_stat_statements tracks historical aggregates. To catch queries that are slow right now, use pg_stat_activity:
SELECT
pid,
now() - query_start AS duration,
state,
wait_event_type,
wait_event,
LEFT(query, 200) AS query_preview
FROM pg_stat_activity
WHERE state = 'active'
AND query_start < now() - INTERVAL '5 seconds'
ORDER BY duration DESC;This surfaces queries that have been running for more than 5 seconds. Adjust the interval threshold to match your SLO. Common causes in TimescaleDB: chunk decompression mid-query, missing index on a large chunk, or a continuous aggregate refresh blocking a read.
Step 4: Monitor Chunk Health
Chunk count and size are the most important structural health signals for a TimescaleDB hypertable. Run this query to get an overview per hypertable:
SELECT
hypertable_schema,
hypertable_name,
COUNT(*) AS chunk_count,
pg_size_pretty(SUM(total_bytes)) AS total_size,
pg_size_pretty(AVG(total_bytes)::bigint) AS avg_chunk_size,
COUNT(*) FILTER (WHERE is_compressed) AS compressed_chunks,
COUNT(*) FILTER (WHERE NOT is_compressed) AS uncompressed_chunks
FROM timescaledb_information.chunks
GROUP BY hypertable_schema, hypertable_name
ORDER BY SUM(total_bytes) DESC;Thresholds to alert on:
| Signal | Warning | Critical |
| Total chunk count | > 500 | > 1,000 |
| Uncompressed chunks older than compression policy | > 10 | > 50 |
| Average chunk size | < 1 MB or > 5 GB | Depends on interval |
High chunk count is a common cause of slow queries even when indexes are correct. If your total chunk count is above 1,000, consider increasing the chunk time interval using set_chunk_time_interval().
To inspect individual chunks and spot oversized or undersized ones:
SELECT
chunk_schema,
chunk_name,
range_start,
range_end,
is_compressed,
pg_size_pretty(total_bytes) AS total_size,
pg_size_pretty(heap_bytes) AS heap_size,
pg_size_pretty(index_bytes) AS index_size
FROM timescaledb_information.chunks
WHERE hypertable_name = 'your_hypertable'
ORDER BY range_start DESC
LIMIT 20;Step 5: Monitor Compression Job Health
A failed compression job leaves chunks uncompressed. The failure is silent unless you query the job stats directly.
SELECT
job_id,
application_name,
scheduled,
last_run_status,
last_run_started_at,
next_start,
total_failures,
total_successes
FROM timescaledb_information.jobs
WHERE application_name LIKE 'Compression%'
ORDER BY total_failures DESC;Check compression ratios for a specific hypertable:
SELECT
chunk_name,
pg_size_pretty(before_compression_total_bytes) AS before,
pg_size_pretty(after_compression_total_bytes) AS after,
ROUND(
before_compression_total_bytes::numeric /
NULLIF(after_compression_total_bytes, 0), 1
) AS compression_ratio
FROM chunk_compression_stats('your_hypertable')
ORDER BY chunk_name DESC
LIMIT 20;
What to alert on:| Signal | Action |
| last_run_status = ‘Failed’ | Alert immediately |
| total_failures increasing | Investigate compression settings |
| compression_ratio < 5x | Review segmentby and orderby column config |
| next_start overdue by > 2x schedule interval | Job may be stalled or worker pool exhausted |
When a compression job fails, temporarily pause it while you investigate:
SELECT alter_job(<job_id>, scheduled => false);
Re-enable once resolved:
SELECT alter_job(<job_id>, scheduled => true);Step 6: Monitor Continuous Aggregate Freshness
A stale continuous aggregate silently serves outdated data. Query timescaledb_information.continuous_aggregate_stats directly, which contains both materialization progress and job status in one place:
SELECT
view_name,
completed_threshold,
invalidation_threshold,
last_run_started_at,
last_successful_finish,
last_run_status,
next_scheduled_run,
total_runs,
total_failures
FROM timescaledb_information.continuous_aggregate_stats
ORDER BY view_name;Calculate refresh lag explicitly:
SELECT
view_name,
completed_threshold,
now() - completed_threshold::timestamptz AS refresh_lag
FROM timescaledb_information.continuous_aggregate_stats
ORDER BY refresh_lag DESC;Alert thresholds:
| Refresh interval | Warning lag | Critical lag |
| 1 hour | > 2 hours | > 6 hours |
| 1 day | > 2 days | > 4 days |
| Custom | > 2x schedule interval | > 4x schedule interval |
Step 7: Monitor Background Worker Health
TimescaleDB runs all background jobs (compression, retention, continuous aggregate refresh) using a fixed pool of background workers. If the pool is exhausted, jobs queue up, and nothing runs. Check the pool size:
SHOW timescaledb.max_background_workers;Check all job statuses at once:
SELECT
job_id,
application_name,
schedule_interval,
last_run_status,
last_run_started_at,
next_start,
total_runs,
total_failures,
total_successes
FROM timescaledb_information.jobs
ORDER BY total_failures DESC, last_run_started_at DESC;If total_failures is incrementing across multiple jobs simultaneously, the worker pool is likely exhausted. Increase timescaledb.max_background_workers in postgresql.conf and reload:
timescaledb.max_background_workers = 16Step 8: Set Up Alerts with CubeAPM

Running these queries manually is a start, but production monitoring requires automated alerts that fire before problems compound. CubeAPM connects to your TimescaleDB instance via its infrastructure monitoring agent, collects PostgreSQL and TimescaleDB metrics, and lets you define alert thresholds across all the signals above.
What CubeAPM monitors for TimescaleDB:
- Query execution time from pg_stat_statements (mean, max, p95 execution time per query)
- Active query duration from pg_stat_activity
- Per-hypertable chunk count and total size
- Compression job success/failure rate and last run status
- Continuous aggregate refresh lag
- Background worker job failure counts
- PostgreSQL-level metrics: connections, cache hit ratio, replication lag, WAL write rate
Key alerts to configure for TimescaleDB:
| Alert | Condition | Severity |
| Slow query | Mean execution time > 5s for any query with calls > 50/min | Warning |
| Long-running query | Active query duration > 30s | Critical |
| Chunk count spike | Chunk count for any hypertable > 1,000 | Warning |
| Compression job failure | Any compression job last_run_status = ‘Failed’ | Critical |
| Stale continuous aggregate | Refresh lag > 2x schedule interval | Warning |
| Background worker failure | Any job total_failures incrementing | Warning |
| Low compression ratio | Compression ratio drops below 5x | Warning |
Read the docs at https://docs.cubeapm.com/ to set up PostgreSQL and infrastructure monitoring.
Summary
TimescaleDB’s failure modes are observable well before they cause incidents, but only if you are watching the right signals. Standard PostgreSQL monitoring misses compression job stalls, chunk count bloat, and continuous aggregate staleness entirely.
| Monitoring area | Primary source | Key signal |
| Slow queries (historical) | pg_stat_statements | mean_exec_time, stddev_exec_time |
| Slow queries (real-time) | pg_stat_activity | Query duration > threshold |
| Chunk health | timescaledb_information.chunks | Chunk count, uncompressed chunk count |
| Compression jobs | timescaledb_information.jobs | last_run_status, total_failures |
| Continuous aggregates | timescaledb_information.continuous_aggregate_stats | completed_threshold, refresh_lag |
| Background workers | timescaledb_information.jobs | Simultaneous failures across job types |
Disclaimer: All SQL in this article uses column names valid for PostgreSQL 13 and later. The total_exec_time and mean_exec_time columns in pg_stat_statements replaced total_time and mean_time in PostgreSQL 13. Current TimescaleDB (2.26 as of March 2026) requires PostgreSQL 15 or 16+. PostgreSQL 15 support is scheduled to be dropped in the next TimescaleDB release (June 2026), so PostgreSQL 16 or higher is strongly recommended for new deployments.
Also read:
What Are the Best New Relic Alternatives with OpenTelemetry Support?
What are the Best New Relic Alternatives for Java Applications?





