CubeAPM
CubeAPM CubeAPM

DuckDB Monitoring: How to Track Performance, Profile Queries, and Instrument Production Workloads

DuckDB Monitoring: How to Track Performance, Profile Queries, and Instrument Production Workloads

Table of Contents

DuckDB is an in-process analytical database designed for speed. It runs inside your application’s memory space, not as a separate server, which means traditional database monitoring approaches like connection pooling metrics or network latency tracking do not apply. Without proper instrumentation, a slow aggregation query or runaway memory usage can degrade application performance for minutes before anyone notices the root cause was a DuckDB operation.

This guide covers how DuckDB monitoring works, what metrics and profiling signals matter, and how to instrument DuckDB in production using native profiling tools, OpenTelemetry, and observability platforms.

What Is DuckDB Monitoring — And Why Traditional DB Monitoring Does Not Work

DuckDB monitoring is the practice of tracking query performance, resource consumption, and runtime health of DuckDB instances embedded inside applications. Unlike PostgreSQL or MySQL, DuckDB is not a standalone server, it compiles into your Python script, Node.js service, or Java application as a library. This architectural difference means you cannot monitor it the way you monitor traditional databases.

Traditional database monitoring relies on:

  • Network metrics like connection count and query queue depth
  • Server level metrics like CPU and disk I/O at the database process level
  • Separate monitoring agents that connect to the database over TCP

None of these apply to DuckDB. Instead, DuckDB monitoring requires:

  • Query level profiling to understand execution time and resource usage per operation
  • Application level instrumentation to track DuckDB calls alongside other application logic
  • Runtime metrics like memory usage, cache hit rates, and intermediate result cardinality

DuckDB’s built-in profiling system exposes these signals directly. The challenge is connecting them to your broader observability stack so you can correlate a slow DuckDB query with the API request that triggered it.

How DuckDB Monitoring Works — Profiling, Metrics, and Telemetry Export

DuckDB monitoring relies on three mechanisms: runtime profiling that captures query execution details, metrics that track database health signals, and telemetry export that sends this data to external observability tools.

DuckDB Profiling — Query Execution Breakdown

DuckDB includes a native profiling system that breaks down query execution into operator level timing and cardinality. When enabled, it shows exactly how long each step of a query took and how many rows passed through each operator.

Profiling modes:

  • EXPLAIN ANALYZE executes the query and returns a detailed execution plan with actual row counts and timings
  • PRAGMA enable_profiling turns on continuous profiling for all queries in a session
  • PRAGMA profiling_output = '/path/to/file.json' writes profiling data to JSON for later analysis

Example profiling output shows:

  • Operator types like HASH_JOIN, AGGREGATE, FILTER with per operator timing
  • Cardinality at each step to identify data explosion or poor selectivity
  • Memory usage per operator to spot memory intensive operations

Unlike generic database query logs that only show total query time, DuckDB profiling reveals whether time is spent in a join, an aggregation, or reading Parquet files from disk.

DuckDB Metrics — Runtime Health Signals

DuckDB exposes a set of runtime metrics that track database health beyond individual queries. These include memory usage, disk spill events, and cache performance.

Key metrics to track:

  • Total memory allocated to DuckDB and memory limit compliance
  • Disk spill events when query intermediates exceed available memory
  • Cache hit rate for frequently accessed data
  • Active query count in multi threaded environments
  • Temporary file creation for intermediate results

DuckDB’s metrics are accessed via system tables and pragmas. For example, PRAGMA memory_limit shows the configured memory ceiling, and DuckDB will spill to disk when queries exceed this threshold. Monitoring spill events helps identify queries that need optimization or memory tuning.

Telemetry Export — Connecting DuckDB to Observability Platforms

DuckDB does not natively emit telemetry to external monitoring systems. To integrate DuckDB monitoring into platforms like infrastructure monitoring tools or APM solutions, you need to export profiling data and metrics programmatically.

Integration patterns:

  • Parse EXPLAIN ANALYZE JSON output and send to a logging backend
  • Wrap DuckDB query calls in application level instrumentation using OpenTelemetry spans
  • Poll DuckDB system tables for metrics and push to Prometheus, Datadog, or similar

An example OpenTelemetry integration in Python:

from opentelemetry import trace
import duckdb

tracer = trace.get_tracer(__name__)

def query_duckdb(query):
    with tracer.start_as_current_span("duckdb.query") as span:
        span.set_attribute("db.statement", query)
        conn = duckdb.connect()
        result = conn.execute(query).fetchall()
        conn.close()
        return result

This approach treats each DuckDB query as a trace span, allowing you to see query duration alongside other application operations in a distributed trace view.

What to Monitor in DuckDB — Key Metrics and Profiling Signals

Monitoring DuckDB effectively requires tracking query performance, memory behavior, and data access patterns. These signals answer different questions: is this query slow, is DuckDB running out of memory, or is data being read efficiently.

Query Performance Metrics

Query duration is the most visible signal, but it does not tell you why a query is slow. Profiling breaks duration into operator level timing.

Metrics to track:

  • Total query execution time from start to completion
  • Time per operator to identify bottlenecks in joins, aggregations, or filters
  • Row cardinality at each operator stage to spot inefficient query plans
  • Number of threads used to verify parallel execution is happening

A query that takes 10 seconds might spend 9 seconds in a HASH_JOIN operator because the join key has high cardinality. Profiling surfaces this, while a simple query log only shows the 10 second total.

Memory Usage and Disk Spill Events

DuckDB’s in-process design makes memory usage critical. Queries that exceed available memory spill intermediate results to disk, which slows execution significantly.

Metrics to track:

  • Current memory usage as a percentage of the configured limit
  • Disk spill events indicating memory pressure
  • Temporary file size for spilled data
  • Memory per operator to identify which parts of a query consume the most RAM

Example scenario: A production API endpoint queries a 50 GB Parquet dataset using DuckDB with a 4 GB memory limit. Without monitoring, you would see slow API responses but no indication that DuckDB is spilling to disk. Tracking spill events reveals the root cause and points to either increasing the memory limit or optimizing the query.

Data Access Patterns and Cache Performance

DuckDB caches frequently accessed data to reduce repeated reads from disk or remote storage. Cache hit rate indicates how often queries reuse cached data versus reading fresh.

Metrics to track:

  • Cache hit rate as a percentage of total reads
  • Number of remote reads for cloud stored data like S3 Parquet files
  • Parquet file read time to identify slow storage
  • Number of partitions scanned to verify partition pruning works

A low cache hit rate combined with high remote read counts suggests queries are scanning too much data or partition filters are not working correctly.

DuckDB Profiling Tools — Built-In and External Options

DuckDB’s native profiling tools cover most use cases, but external tools help visualize and aggregate profiling data across multiple queries.

EXPLAIN ANALYZE — Query Execution Plan with Timing

EXPLAIN ANALYZE runs a query and returns a detailed breakdown of execution. It is the primary tool for understanding why a specific query is slow.

EXPLAIN ANALYZE
SELECT station_name, count(*) AS num_services
FROM train_services
GROUP BY station_name
ORDER BY num_services DESC
LIMIT 10;

Output includes:

  • Each operator in the execution plan with row counts and timing
  • Total query time and per operator time
  • Memory usage per operator

This is best used during development to optimize queries before they hit production.

PRAGMA profiling_mode — Continuous Profiling

PRAGMA enable_profiling turns on profiling for all queries in a session. Combined with PRAGMA profiling_output, it writes profiling data to a file for later analysis.

PRAGMA enable_profiling;
PRAGMA profiling_output = '/tmp/duckdb_profile.json';

This approach is useful for capturing profiling data in production without modifying application code for every query.

Third Party Profiling and Observability Integrations

Tools like Grafana, Prometheus, and APM platforms can aggregate DuckDB profiling data when instrumented correctly.

  • Parse JSON profiling output and send to a log aggregation tool
  • Emit custom metrics from application code based on DuckDB query timing
  • Use synthetic monitoring to run canary queries and alert on performance degradation

CubeAPM supports OpenTelemetry based instrumentation, which allows wrapping DuckDB queries in trace spans. This approach links DuckDB query performance to the full application request context, making it easier to diagnose whether a slow API response is caused by DuckDB or another service.

Monitoring DuckDB in Production — Instrumentation Patterns

Production DuckDB monitoring requires more than ad hoc profiling. You need continuous instrumentation, alerting on anomalies, and integration with your existing observability stack.

Instrumenting DuckDB Queries with OpenTelemetry

OpenTelemetry provides a standard way to trace application code. Wrapping DuckDB queries in trace spans makes query performance visible in distributed tracing systems.

Example in Python:

from opentelemetry import trace
import duckdb

tracer = trace.get_tracer(__name__)

def run_analytics_query(query):
    with tracer.start_as_current_span("duckdb.analytics") as span:
        span.set_attribute("db.system", "duckdb")
        span.set_attribute("db.statement", query)
        conn = duckdb.connect("analytics.db")
        result = conn.execute(query).fetchall()
        conn.close()
        return result

This creates a span named duckdb.analytics with query metadata. When exported to an APM platform, you can see DuckDB query duration alongside API handler duration, database lookups, and external API calls in a single trace view.

Alerting on DuckDB Performance Anomalies

Alerting requires defining thresholds for acceptable query performance and resource usage. Unlike server based databases where you alert on connection pool exhaustion or replication lag, DuckDB alerts focus on query duration and memory pressure.

Alert conditions to set:

  • Query execution time exceeds a threshold like 5 seconds for a specific query pattern
  • Memory usage exceeds 90% of the configured limit
  • Disk spill events occur indicating memory pressure
  • Cache hit rate drops below a threshold like 60%

Platforms like CubeAPM allow creating alerts based on trace span duration, which works well for OpenTelemetry instrumented DuckDB queries. You can alert when any span tagged db.system=duckdb exceeds a defined latency threshold.

Correlating DuckDB Performance with Application Metrics

DuckDB runs inside your application, so its performance directly impacts application performance. Correlating DuckDB metrics with application metrics reveals whether a slow DuckDB query is the root cause of a slow API response or whether the bottleneck is elsewhere.

Correlation patterns:

  • Compare DuckDB query duration to total request duration to calculate DuckDB’s contribution
  • Track memory usage trends alongside application memory to detect memory leaks
  • Correlate disk spill events with increased API latency to confirm the impact

CubeAPM’s trace correlation automatically links DuckDB spans to the parent HTTP request span, showing exactly how much time each request spent in DuckDB versus other operations.

Monitoring DuckDB with CubeAPM — OpenTelemetry Native Observability

CubeAPM is built natively on OpenTelemetry and supports instrumenting DuckDB queries as trace spans. This approach connects DuckDB performance to the full application context without requiring separate monitoring infrastructure.

How CubeAPM monitors DuckDB:

  • Ingest OpenTelemetry traces from applications that wrap DuckDB queries in spans
  • Automatically index trace attributes like db.statement for fast search and filtering
  • Correlate DuckDB query duration with logs, infrastructure metrics, and real user monitoring data
  • Alert on slow DuckDB queries based on span duration thresholds

CubeAPM runs inside your VPC or on premises, which means no DuckDB telemetry leaves your infrastructure. This matters for teams with data residency or compliance requirements. Pricing is $0.15/GB for all ingested telemetry with unlimited retention and no per seat fees.

Example workflow: A production API calls DuckDB to aggregate user analytics. The API is instrumented with OpenTelemetry. Each DuckDB query is wrapped in a span tagged db.system=duckdb and db.statement=<query>. CubeAPM ingests these spans, indexes the query text, and surfaces them in a trace view. If a query starts taking longer than usual, CubeAPM’s anomaly detection flags it and routes an alert to Slack with the full trace context.

Best Practices for DuckDB Monitoring

Monitoring DuckDB effectively requires combining profiling, application level instrumentation, and continuous tracking of resource usage.

Enable Profiling During Development and Optimization

Use EXPLAIN ANALYZE to profile every query before deploying it to production. This identifies inefficient query plans, missing indexes, or data access patterns that will not scale.

Run profiling with production scale data. A query that runs fast on 1 GB of sample data might perform poorly on 100 GB in production. Test with realistic data volumes to surface bottlenecks early.

Instrument All DuckDB Queries in Production

Wrap every DuckDB query call in application level instrumentation using OpenTelemetry or a similar tracing framework. This makes query performance visible in your APM platform and correlates it with other application operations.

Tag spans with query metadata like the table being queried and the operation type. This allows filtering traces by DuckDB query type, such as all aggregations or all joins, to identify patterns.

Monitor Memory Usage and Disk Spill Events Continuously

Set up monitoring for DuckDB memory usage and disk spill events. Memory pressure is the most common cause of slow DuckDB queries in production. Track memory as a percentage of the configured limit and alert when it exceeds 90%.

If disk spills occur frequently, either increase the memory limit or optimize queries to reduce intermediate result size. Profiling reveals which operators consume the most memory.

Correlate DuckDB Performance with Application Performance

DuckDB performance only matters in the context of application performance. A 2 second DuckDB query is acceptable if the overall API response time is 5 seconds. It is a problem if the API should respond in 200 milliseconds.

Track DuckDB query duration as a percentage of total request duration. If DuckDB consistently accounts for more than 50% of request time, focus optimization efforts there.

Use Synthetic Monitoring to Detect Regressions

Run canary queries periodically to detect performance regressions before users notice. Synthetic checks can execute a known DuckDB query every minute and alert if duration exceeds a baseline.

This approach catches issues caused by code changes, schema changes, or data growth without waiting for production traffic to trigger an alert.

Frequently Asked Questions

What metrics should I track for DuckDB in production?

Track query execution time, memory usage, disk spill events, and cache hit rate. These metrics reveal query performance, memory pressure, and data access efficiency.

Can I use Prometheus to monitor DuckDB?

Yes, but DuckDB does not natively export metrics to Prometheus. You need to poll DuckDB system tables or parse profiling output and push metrics to Prometheus using a custom exporter.

How do I profile a slow DuckDB query?

Use `EXPLAIN ANALYZE` followed by your query. This returns a detailed execution plan with per operator timing and row counts, showing exactly where time is spent.

Does DuckDB support distributed tracing?

DuckDB does not emit traces natively, but you can instrument DuckDB queries using OpenTelemetry by wrapping query calls in trace spans in your application code.

What causes DuckDB to spill to disk?

DuckDB spills to disk when query intermediates exceed the configured memory limit. This happens with large joins, aggregations, or window functions that generate many intermediate rows.

How do I monitor DuckDB memory usage?

Query `PRAGMA memory_limit` to see the configured limit and track actual memory usage by monitoring system metrics from the process running DuckDB or by parsing profiling output.

Can I monitor DuckDB with CubeAPM?

Yes, CubeAPM ingests OpenTelemetry traces from applications that instrument DuckDB queries as spans. This approach links DuckDB performance to full application traces, logs, and infrastructure metrics in one platform.

Disclaimer: The information in this article reflects the latest details available at the time of publication and may change as technologies and products evolve. Features, pricing, and plan limits can change over time. Always verify the latest information directly with the vendor before making purchasing or deployment decisions.

×
×