CubeAPM
CubeAPM CubeAPM

How to Monitor Azure SQL Database Performance and Deadlocks 

How to Monitor Azure SQL Database Performance and Deadlocks 

Table of Contents

Azure SQL Database exposes performance data through four distinct layers: platform metrics available via Azure Monitor, resource-level DMVs (Dynamic Management Views) queryable through T-SQL, Query Store for historical query and plan tracking, and diagnostic logs that route to Log Analytics for KQL-based analysis. 

Effective monitoring requires all four. Platform metrics show you that something is wrong. DMVs and Query Store tell you which queries and sessions are responsible. Diagnostic logs give you deadlock graphs and historical trends that neither DMVs nor Query Store retain after a restart or failover.

Key Takeaways

  • Azure SQL Database exposes CPU, DTU/vCore percentage, storage, connections, deadlocks, and worker thread usage as platform metrics via Azure Monitor without any configuration
  • sys.dm_db_resource_stats is the primary DMV for current-state resource analysis. It returns one row per 15 seconds and retains 60 minutes of history. For longer retention, use sys.resource_stats which captures every 5 minutes and retains 14 days
  • Query Store is enabled by default on Azure SQL Database and acts as a flight data recorder. It persists query text, execution plans, and runtime statistics in the user database across restarts and failovers
  • Deadlock capture in Azure SQL Database is done through Extended Events at the database scope or via diagnostic logs routed to Log Analytics. There is no instance-level system_health session as there is in on-premises SQL Server
  • Read Committed Snapshot Isolation (RCSI) and Snapshot Isolation are both enabled by default on new Azure SQL Databases. Despite this, deadlocks can still occur between write operations, under other isolation levels, or when transactions access resources in conflicting order
  • Enabling diagnostic logging to a Log Analytics workspace is the most important configuration step for production monitoring. It is not enabled by default and must be explicitly configured. Enable all nine log categories including QueryStoreRuntimeStatistics, QueryStoreWaitStatistics, and Deadlocks

What Azure SQL Database Exposes Out of the Box

Without any additional configuration, Azure SQL Database surfaces platform metrics through Azure Monitor. These are visible in the Azure portal under the Monitoring section of your database resource.

MetricWhat it measuresAlert threshold guidance
CPU percentageCPU used as a percentage of the service tier limitAlert above 80% sustained for 5 minutes
DTU percentage (DTU model)Blended CPU, memory, and I/O as a percentage of DTU limitAlert above 80% sustained for 5 minutes
vCore percentage (vCore model)vCores used as a percentage of provisioned vCoresAlert above 80% sustained for 5 minutes
Data IO percentageData file read/write I/O as a percentage of the limitAlert above 80%
Log write percentageTransaction log write rate as a percentage of the limitAlert above 80%. High log write often indicates large write transactions
Workers percentageWorker threads used as a percentage of the limitAlert above 80%. Approaching 100% causes connection failures
Connections successful / failedConnection counts by outcomeAlert on any sustained increase in failed connections
DeadlocksNumber of deadlocks per intervalAlert on any non-zero value if deadlocks are unexpected
Storage percentageData file space used as a percentage of the allocated storageAlert above 85%

These metrics require no query and no additional setup. They are the right starting point for alerting configuration.

Step 1: Enable Diagnostic Logging to Log Analytics

Platform metrics give you aggregated numbers. Diagnostic logs give you deadlock graphs, Query Store statistics, individual query timeouts, blocking information, and errors. Diagnostic logging is not enabled by default and must be configured explicitly.

Enable it via the Azure CLI. The official Microsoft recommendation is to enable all nine log categories for comprehensive monitoring:

az monitor diagnostic-settings create \

  --resource "/subscriptions/{sub-id}/resourceGroups/myResourceGroup/providers/Microsoft.Sql/servers/myserver/databases/mydb" \

  --name "sql-diagnostics" \

  --workspace "/subscriptions/{sub-id}/resourceGroups/myResourceGroup/providers/Microsoft.OperationalInsights/workspaces/myworkspace" \

  --logs '[

    {"category": "SQLInsights",                  "enabled": true},

    {"category": "AutomaticTuning",              "enabled": true},

    {"category": "QueryStoreRuntimeStatistics",  "enabled": true},

    {"category": "QueryStoreWaitStatistics",     "enabled": true},

    {"category": "Errors",                       "enabled": true},

    {"category": "DatabaseWaitStatistics",       "enabled": true},

    {"category": "Timeouts",                     "enabled": true},

    {"category": "Blocks",                       "enabled": true},

    {"category": "Deadlocks",                    "enabled": true}

  ]' \

  --metrics '[{"category": "Basic", "enabled": true}]'

Or in the portal: navigate to your database, then Monitoring > Diagnostic settings > Add diagnostic setting, select all categories above, and point them at your Log Analytics workspace.

Diagnostic log categories and what they contain:

CategoryWhat is logged
SQLInsightsIntelligent Insights: AI-detected performance anomalies, degradations, and root cause analysis
AutomaticTuningAutomatic index and query plan corrections applied or recommended
QueryStoreRuntimeStatisticsQuery runtime statistics from Query Store: CPU, duration, logical reads per query
QueryStoreWaitStatisticsWait category statistics from Query Store per query
ErrorsError 1205 (deadlock victim), error 40613 (database unavailable), other SQL errors
DatabaseWaitStatisticsWait type breakdown at the database level
TimeoutsQueries that exceeded the configured query timeout
BlocksBlocking chains: blocked and blocking session IDs, wait time, resource
DeadlocksFull deadlock XML graph including the queries, resources, and victim selection

The SQLInsights category is the Intelligent Insights log for AI-based anomaly detection. It is distinct from the “SQL Insights (preview)” monitoring solution, which was retired on December 31, 2024. The Query Store data in Log Analytics comes from the QueryStoreRuntimeStatistics and QueryStoreWaitStatistics categories, not from SQLInsights.

Step 2: Query Performance Using DMVs

DMVs provide a real-time view of current database activity. Run these from any tool that can connect to your Azure SQL Database including Azure Data Studio, SSMS, or the Azure portal query editor.

Current resource usage (last 60 minutes, 15-second intervals)

SELECT

    end_time,

    avg_cpu_percent,

    avg_data_io_percent,

    avg_log_write_percent,

    avg_memory_usage_percent

FROM sys.dm_db_resource_stats

ORDER BY end_time DESC;

sys.dm_db_resource_stats is specific to Azure SQL Database. It returns data expressed as a percentage of service tier limits, at 15-second granularity, with 60 minutes of retention. For longer history at 5-minute granularity and 14-day retention, query sys.resource_stats from the master database.

Top queries by CPU (current plan cache)

SELECT TOP 10

    query_stats.query_hash AS query_hash,

    SUM(query_stats.total_worker_time) / SUM(query_stats.execution_count) AS avg_cpu_microseconds,

    SUM(query_stats.execution_count) AS execution_count,

    MIN(query_stats.statement_text) AS statement_text

FROM (

    SELECT

        qs.*,

        SUBSTRING(st.text,

            (qs.statement_start_offset / 2) + 1,

            ((CASE qs.statement_end_offset

                WHEN -1 THEN DATALENGTH(st.text)

                ELSE qs.statement_end_offset

            END - qs.statement_start_offset) / 2) + 1

        ) AS statement_text

    FROM sys.dm_exec_query_stats AS qs

    CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st

) AS query_stats

GROUP BY query_stats.query_hash

ORDER BY avg_cpu_microseconds DESC;

sys.dm_exec_query_stats only shows queries currently in the plan cache. Plans are evicted under memory pressure or after a restart. Use Query Store for persistent historical data.

Active blocking chains

SELECT

    blocking_session_id,

    session_id,

    wait_type,

    wait_time / 1000.0 AS wait_seconds,

    SUBSTRING(st.text,

        (r.statement_start_offset / 2) + 1,

        ((CASE r.statement_end_offset WHEN -1 THEN DATALENGTH(st.text)

          ELSE r.statement_end_offset END - r.statement_start_offset) / 2) + 1

    ) AS blocked_statement

FROM sys.dm_exec_requests r

CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS st

WHERE blocking_session_id <> 0

ORDER BY wait_time DESC;

Current lock activity

SELECT

    resource_type,

    resource_database_id,

    resource_description,

    request_mode,

    request_type,

    request_status,

    request_session_id

FROM sys.dm_tran_locks

WHERE resource_type <> 'DATABASE'

ORDER BY request_session_id;

Step 3: Query Store for Historical Query Analysis

Query Store is enabled by default on Azure SQL Database. It persists query text, execution plans, and runtime statistics in the user database itself, surviving restarts and failovers. Use it for identifying plan regressions and historical top queries.

Verify Query Store is active:

SELECT

    actual_state_desc,

    desired_state_desc,

    current_storage_size_mb,

    max_storage_size_mb,

    stale_query_threshold_days

FROM sys.database_query_store_options;

actual_state_desc should be READ_WRITE. If it shows READ_ONLY, the store has hit its storage limit and is no longer capturing new data. Increase the storage limit or clean stale data:

ALTER DATABASE CURRENT

SET QUERY_STORE (MAX_STORAGE_SIZE_MB = 2048);

EXEC sys.sp_query_store_flush_db;

Top 10 queries by average CPU over the last 24 hours:

SELECT TOP 10

    qt.query_sql_text,

    rs.avg_cpu_time / 1000.0 AS avg_cpu_ms,

    rs.avg_duration / 1000.0 AS avg_duration_ms,

    rs.avg_logical_io_reads,

    rs.count_executions,

    p.plan_id

FROM sys.query_store_query_text qt

JOIN sys.query_store_query q  ON qt.query_text_id = q.query_text_id

JOIN sys.query_store_plan p   ON q.query_id = p.query_id

JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id

JOIN sys.query_store_runtime_stats_interval rsi

    ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id

WHERE rsi.start_time > DATEADD(HOUR, -24, GETUTCDATE())

ORDER BY rs.avg_cpu_time DESC;

Queries with plan regressions (multiple plans, significant performance variance):

SELECT

    qt.query_sql_text,

    q.query_id,

    COUNT(DISTINCT p.plan_id) AS plan_count,

    MAX(rs.avg_duration) / 1000.0 AS max_avg_duration_ms,

    MIN(rs.avg_duration) / 1000.0 AS min_avg_duration_ms,

    (MAX(rs.avg_duration) - MIN(rs.avg_duration)) / 1000.0 AS duration_variance_ms

FROM sys.query_store_query_text qt

JOIN sys.query_store_query q  ON qt.query_text_id = q.query_text_id

JOIN sys.query_store_plan p   ON q.query_id = p.query_id

JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id

GROUP BY qt.query_sql_text, q.query_id

HAVING COUNT(DISTINCT p.plan_id) > 1

   AND MAX(rs.avg_duration) > MIN(rs.avg_duration) * 3

ORDER BY duration_variance_ms DESC;

Step 4: Capture and Analyze Deadlocks

How deadlocks work in Azure SQL Database

A deadlock occurs when two or more sessions hold locks that the other sessions need, with neither able to proceed. Azure SQL Database’s deadlock monitor automatically selects one session as the victim, rolls back its transaction, and returns error 1205 to the victim’s connection. The application should implement retry logic for error 1205.

RCSI and Snapshot Isolation are both enabled by default on new Azure SQL Databases. Read operations under RCSI do not acquire shared locks and therefore do not participate in most read-write deadlocks. Deadlocks can still occur between write operations when transactions run under isolation levels that use shared locks, when RCSI is explicitly disabled, or when transactions access the same resources in conflicting order.

Capturing deadlocks via Extended Events

Azure SQL Database does not have an instance-level Extended Events session. There is no access to the system_health session as there is with on-premises SQL Server or Azure SQL Managed Instance. Deadlocks must be captured at the database scope.

Create a database-scoped Extended Events session:

CREATE EVENT SESSION [deadlock_monitor] ON DATABASE

ADD EVENT sqlserver.xml_deadlock_report

ADD TARGET package0.ring_buffer

    (SET max_memory = 51200)

WITH (

    MAX_DISPATCH_LATENCY = 5 SECONDS

);

ALTER EVENT SESSION [deadlock_monitor] ON DATABASE STATE = START;

Read captured deadlock graphs from the ring buffer:

SELECT

    xdr.value('@timestamp', 'datetime2') AS deadlock_time,

    xdr.query('.') AS deadlock_graph_xml

FROM (

    SELECT CAST(target_data AS XML) AS target_data

    FROM sys.dm_xe_database_session_targets t

    JOIN sys.dm_xe_database_sessions s

        ON t.event_session_address = s.address

    WHERE s.name = 'deadlock_monitor'

      AND t.target_name = 'ring_buffer'

) AS data

CROSS APPLY target_data.nodes('//RingBufferTarget/event[@name="xml_deadlock_report"]') AS xdt(xdr)

ORDER BY deadlock_time DESC;

The deadlock_graph_xml column contains the full deadlock graph XML. Open it in SSMS to see the visual deadlock graph showing the sessions, the resources they hold, and the resources they are waiting for.

Querying deadlocks from Log Analytics (KQL)

If diagnostic logging is enabled with the Deadlocks category, deadlock events flow to Log Analytics and can be queried with KQL for longer retention than the ring buffer:

AzureDiagnostics

| where ResourceProvider == "MICROSOFT.SQL"

| where Category == "Deadlocks"

| where TimeGenerated > ago(7d)

| project TimeGenerated, ResourceGroup, Resource, deadlock_xml_s

| order by TimeGenerated desc

Deadlock frequency by day:

AzureDiagnostics

| where ResourceProvider == "MICROSOFT.SQL"

| where Category == "Deadlocks"

| where TimeGenerated > ago(30d)

| summarize deadlock_count = count() by bin(TimeGenerated, 1d)

| order by TimeGenerated asc

| render timechart

Step 5: Query Performance Insight

Query Performance Insight is the Azure portal’s built-in GUI for Query Store data. Navigate to your database in the Azure portal and then to Intelligent Performance > Query Performance Insight. It requires Query Store to be enabled (which it is by default) and requires at least a few hours of query history.

It shows top queries by CPU, duration, and execution count over a configurable time window. For queries with multiple execution plans, it shows the plan change timeline alongside the performance degradation, which is faster for initial investigation than writing Query Store T-SQL queries manually.

Step 6: Configure Metric Alerts

# Alert when CPU percentage exceeds 80% for 5 minutes

az monitor metrics alert create \

  --name "AzureSQL-HighCPU" \

  --resource-group myResourceGroup \

  --scopes "/subscriptions/{sub-id}/resourceGroups/myResourceGroup/providers/Microsoft.Sql/servers/myserver/databases/mydb" \

  --condition "avg cpu_percent > 80" \

  --window-size 5m \

  --evaluation-frequency 1m \

  --severity 2 \

  --action "/subscriptions/{sub-id}/resourceGroups/myResourceGroup/providers/microsoft.insights/actionGroups/myActionGroup"

# Alert on any deadlocks

az monitor metrics alert create \

  --name "AzureSQL-Deadlocks" \

  --resource-group myResourceGroup \

  --scopes "/subscriptions/{sub-id}/resourceGroups/myResourceGroup/providers/Microsoft.Sql/servers/myserver/databases/mydb" \

  --condition "total deadlock > 0" \

  --window-size 5m \

  --evaluation-frequency 1m \

  --severity 1

Step 7: Diagnose and Resolve Common Performance Problems

High CPU with slow queries

  1. Check sys.dm_db_resource_stats to confirm CPU is the bottleneck
  2. Query sys.dm_exec_query_stats for top queries by CPU in the current cache
  3. Cross-reference with Query Store for the same queries over a longer window
  4. Use Query Performance Insight to check for plan regressions

Common causes: missing indexes (Database Advisor provides index recommendations driven by Query Store data), parameter sniffing causing a bad plan choice, outdated statistics, or cartesian joins.

Deadlock remediation

  • Recurring deadlocks almost always have one of three root causes.
  • Access order inconsistency. Two transactions access the same tables or rows in opposite order. Fix by standardizing the access order across all transactions that touch the same resources.
  • Long-held locks. Transactions hold locks longer than necessary because they perform non-database work such as HTTP calls, file I/O, or UI interactions inside an open transaction. Fix by keeping transactions as short as possible and moving non-database work outside the transaction boundary.
  • Missing indexes. A table scan acquires more locks than an index seek on the same data. Adding an appropriate index reduces the lock scope. Use the Database Advisor in the portal for index recommendations driven by Query Store data.

Blocking chains

  1. Run the blocking chain query against sys.dm_exec_requests
  2. Identify the head blocker, which is the session with blocking_session_id = 0 that others are waiting on
  3. Examine the head blocker’s current statement and open transactions with sys.dm_tran_database_transactions
  4. Check for long-open idle transactions using sys.dm_exec_sessions filtered on open_transaction_count > 0 and status = ‘sleeping’

Common Setup Mistakes

MistakeEffectFix
Diagnostic logging not enabledNo deadlock graphs, no Query Store history in Log Analytics, no blocking recordsEnable all nine diagnostic log categories pointing to a Log Analytics workspace
Enabling only SQLInsights for Query Store dataSQLInsights is Intelligent Insights, not raw Query Store dataEnable QueryStoreRuntimeStatistics and QueryStoreWaitStatistics for Query Store data in Log Analytics
Query Store in READ_ONLY stateNo new query data being capturedIncrease MAX_STORAGE_SIZE_MB and run sys.sp_query_store_flush_db
Alerting on deadlock count > 1 instead of > 0First deadlock of an interval does not alertSet threshold to > 0 for deadlock alerts
Querying sys.dm_exec_query_stats for historyData disappears on plan cache eviction or restartUse Query Store for persistent historical query data
Not retrying on error 1205Application surfaces deadlock errors permanently to usersImplement retry logic for error 1205 with exponential backoff in the application layer

When Database Metrics Are Not the Whole Picture

Azure SQL Database metrics and deadlock graphs tell you what happened inside the database. A deadlock at 14:32 shows you which two queries conflicted. What it does not show is which application request triggered those queries, which upstream service call initiated the transaction, or whether a deployment at 14:20 introduced the code path responsible.

CubeAPM correlates Azure SQL Database telemetry with distributed application traces and infrastructure metrics using the shared OpenTelemetry context. When a deadlock alert fires, you can move from the database event to the specific application request that caused it, see the full span breakdown of how that request reached the database, and identify whether the problem is in the query itself or in the application logic wrapping it. It runs self-hosted inside your own infrastructure at $0.15/GB ingestion with no per-user fees, so your telemetry data never leaves your environment.

Summary

Monitoring Azure SQL Database performance and deadlocks requires four layers working together: Azure Monitor platform metrics for aggregated health signals and alerting, DMVs for real-time query and session analysis, Query Store for persistent historical query and plan tracking, and diagnostic logs in Log Analytics for deadlock graphs, Query Store history, blocking records, and timeouts. Enable diagnostic logging on day one with all nine categories. The SQLInsights category is Intelligent Insights for AI-based anomaly detection, not raw Query Store data. The Query Store log categories are QueryStoreRuntimeStatistics and QueryStoreWaitStatistics.

LayerPrimary useKey detail
Azure Monitor metricsAlerting on CPU, DTU/vCore, workers, deadlocks, storageAvailable without any configuration
sys.dm_db_resource_statsReal-time resource analysis15-second granularity, 60-minute retention
sys.resource_stats (master db)Medium-term resource trending5-minute granularity, 14-day retention
sys.dm_exec_query_statsCurrent top queries by CPU and durationPlan cache only, lost on eviction
Query StoreHistorical query and plan analysis, plan regression detectionPersistent across restarts, enabled by default
Extended Events (database scope)Real-time deadlock graph captureRing buffer only, no instance-level session available
Log Analytics (Deadlocks category)Long-term deadlock history, KQL analysisRequires diagnostic settings enabled
Log Analytics (QueryStoreRuntimeStatistics)Query Store data in Log AnalyticsSeparate from SQLInsights (Intelligent Insights)
Query Performance InsightGUI for Query Store dataPortal tool, requires Query Store enabled

Disclaimer: DMV names, Query Store behavior, RCSI and Snapshot Isolation defaults, Extended Events scope limitations, diagnostic log category descriptions, including the distinction between SQLInsights/Intelligent Insights and QueryStoreRuntimeStatistics, and the retirement of SQL Insights (preview) on December 31, 2024, are verified against Microsoft Learn official documentation (learn.microsoft.com/en-us/azure/azure-sql/database) as of May 2026.

Also read:

How to Monitor Cron Jobs for Silent Failures 

How to Instrument Go Applications with OpenTelemetry 

What Is the Difference Between OpenTelemetry and Zipkin? 

×
×