CubeAPM
CubeAPM CubeAPM

How to Monitor Azure Data Factory Pipeline Failures 

How to Monitor Azure Data Factory Pipeline Failures 

Table of Contents

Azure Data Factory stores pipeline run data for only 45 days in ADF Studio. Without diagnostic logging to Log Analytics, you have no way to query failures beyond that window, no way to alert on failure patterns, and no way to correlate pipeline failures with other Azure resource events. 

The two layers of monitoring that every production ADF deployment needs are the ADF Studio monitor view for real-time run inspection and diagnostic logs routed to Log Analytics for historical analysis, alerting, and cross-factory visibility.

Key Takeaways

  • ADF Studio stores pipeline run data for only 45 days. Enable diagnostic logging to Log Analytics on day one if you need longer retention or cross-factory queries.
  • Diagnostic logs must be explicitly enabled. They are not on by default. Enable PipelineRuns, ActivityRuns, and TriggerRuns log categories at a minimum.
  • When routing logs to Log Analytics, use resource-specific tables (ADFPipelineRun, ADFActivityRun, ADFTriggerRun) rather than the legacy AzureDiagnostics table. Resource-specific tables have cleaner schemas and better query performance.
  • ADF distinguishes between UserError and system errors in the FailureType field. UserError means the failure was caused by user configuration or data (bad SQL, missing file, wrong connection string). System errors indicate ADF service-level or infrastructure failures. Use this distinction in alert rules to avoid alerting on user-fixable data quality issues.
  • ADF provides six platform metrics without any configuration: PipelineFailedRuns, PipelineSucceededRuns, ActivityFailedRuns, ActivitySucceededRuns, TriggerFailedRuns, and TriggerSucceededRuns.
  • The Workflow Orchestration Manager (Apache Airflow in ADF) is being retired. New instances cannot be created after January 1, 2026. Migrate to Apache Airflow jobs in Microsoft Fabric.

What ADF Exposes Without Any Configuration

Three monitoring surfaces are available without enabling diagnostic settings.

ADF Studio Monitor Tab

The Monitor tab in Azure Data Factory Studio (accessible from the left menu) shows all pipeline runs, activity runs, and trigger runs in a list view with filtering by status, pipeline name, date range, and annotation. It shows run duration, start time, triggered by, and error details.

From the Monitor tab, you can:

  • View the full activity run breakdown for any pipeline run by clicking the pipeline name
  • Rerun a failed pipeline from the beginning or from a specific failed activity
  • Cancel in-progress runs
  • View the consumption report for a run (DIU and activity hours consumed)

This view shows data for the last 45 days only.

Platform Metrics

Six run-count metrics are available in Azure Monitor without any diagnostic settings:

MetricREST API nameWhat it counts
Pipeline failed runsPipelineFailedRunsPipeline runs that ended with Failed status
Pipeline succeeded runsPipelineSucceededRunsPipeline runs that ended with Succeeded status
Pipeline cancelled runsPipelineCancelledRunsPipeline runs that were cancelled
Activity failed runsActivityFailedRunsActivity runs that ended with Failed status
Activity succeeded runsActivitySucceededRunsActivity runs that ended with Succeeded status
Trigger failed runsTriggerFailedRunsTrigger-initiated runs that failed
Trigger succeeded runsTriggerSucceededRunsTrigger-initiated runs that succeeded

These metrics are aggregated counts. They do not include pipeline names, error messages, or activity-level detail. Use them for alerting on failure count thresholds and for overview dashboards. For root cause analysis, you need diagnostic logs.

Azure Activity Log

The Azure Activity Log records control plane operations on your ADF resource: who created or modified pipelines, triggers, and linked services. It does not record pipeline run data. Use it for auditing configuration changes, not for monitoring execution failures.

Step 1: Enable Diagnostic Logging to Log Analytics

Enable diagnostic settings via the Azure portal or CLI. Use resource-specific table routing rather than AzureDiagnostics for cleaner schemas.

az monitor diagnostic-settings create \

  --resource "/subscriptions/{sub-id}/resourceGroups/myResourceGroup/providers/Microsoft.DataFactory/factories/myDataFactory" \

  --name "adf-diagnostics" \

  --workspace "/subscriptions/{sub-id}/resourceGroups/myResourceGroup/providers/Microsoft.OperationalInsights/workspaces/myworkspace" \

  --logs '[

    {"category": "PipelineRuns",   "enabled": true},

    {"category": "ActivityRuns",   "enabled": true},

    {"category": "TriggerRuns",    "enabled": true}

  ]'

Or in the portal: navigate to your Data Factory, then Monitoring > Diagnostic settings > Add diagnostic setting, select the three log categories above, choose Resource specific as the destination table type, and point them at your Log Analytics workspace.

Important: Select Resource specific (not Legacy/AzureDiagnostics) when configuring the destination. Resource-specific routing creates the ADFPipelineRun, ADFActivityRun, and ADFTriggerRun tables directly. These have typed columns and query better than the generic AzureDiagnostics table, where all values are stored in a property bag.

For SSIS workloads, also enable the SSIS-specific log categories: SSISIntegrationRuntimeLogs, SSISPackageEventMessages, SSISPackageExecutableStatistics, SSISPackageExecutionComponentPhases, and SSISPackageExecutionDataStatistics.

Step 2: Understand FailureType Before Writing Alerts

Every failed pipeline and activity run in ADF has a FailureType field with two possible values:

FailureTypeWhat it meansWho should act
UserErrorThe failure was caused by user configuration or data: wrong connection string, missing source file, bad SQL syntax, schema mismatch, access deniedData engineers or data owners fix the pipeline configuration or source data
System error (empty or other value)The failure was caused by ADF infrastructure, integration runtime, or a transient service issueAzure Support or the ADF team handles it. Often auto-retries

Why this matters for alerts: Alerting on all pipeline failures without filtering by FailureType will include every data quality issue, every missing file, and every temporary source unavailability. In most data engineering environments, UserError failures are expected and handled by retry logic or pipeline error handling paths. Alerts should target system errors or sustained UserError spikes, not individual UserError occurrences.

Step 3: Query Pipeline Failures with KQL

Run these queries from your Log Analytics workspace after diagnostic logs are flowing.

Failed pipeline runs in the last 24 hours

ADFPipelineRun

| where TimeGenerated > ago(24h)

| where Status == "Failed"

| project TimeGenerated, PipelineName, RunId, Status, FailureType, ErrorCode, ErrorMessage, Start, End

| order by TimeGenerated desc

Pipeline failure rate by pipeline name (last 7 days)

ADFPipelineRun

| where TimeGenerated > ago(7d)

| where Status != "InProgress" and Status != "Queued"

| summarize

    total = count(),

    failed = countif(Status == "Failed"),

    failure_rate = round(100.0 * countif(Status == "Failed") / count(), 2)

    by PipelineName

| where failed > 0

| order by failure_rate desc

Pipeline availability excluding UserErrors (official Microsoft pattern)

ADFPipelineRun

| where Status != "InProgress" and Status != "Queued"

| where FailureType != "UserError"

| summarize availability = 100.00 - (100.00 * countif(Status != "Succeeded") / count())

    by bin(TimeGenerated, 1h), _ResourceId

| order by TimeGenerated asc

| render timechart

Failed activity runs with error details

ADFActivityRun

| where TimeGenerated > ago(24h)

| where Status == "Failed"

| project TimeGenerated, PipelineName, ActivityName, ActivityType, Status, FailureType, ErrorCode, ErrorMessage, Start, End

| order by TimeGenerated desc

Top 5 activities failing with system errors (official Microsoft pattern)

ADFActivityRun

| where TimeGenerated >= ago(24h)

| where Status != "InProgress" and Status != "Queued"

| where FailureType != "UserError"

| summarize failureCount = countif(Status != "Succeeded") by bin(TimeGenerated, 1h), ActivityName

| top 5 by failureCount desc nulls last

| order by TimeGenerated asc

| render timechart

Long-running pipelines (executions exceeding a threshold)

ADFPipelineRun

| where TimeGenerated > ago(24h)

| where Status == "Succeeded"

| extend durationMinutes = datetime_diff("minute", End, Start)

| where durationMinutes > 60

| project TimeGenerated, PipelineName, durationMinutes, Start, End, RunId

| order by durationMinutes desc

Latest status per pipeline run (avoid duplicates from multi-record runs)

ADFPipelineRun

| summarize argmax(TimeGenerated, *) by RunId, Status, _ResourceId

This query is important because ADF writes multiple records per run to the ADFPipelineRun table as the run progresses. Without argmax, queries counting run statuses can double-count in-progress and completed records for the same run.

Trigger failures in the last 24 hours

ADFTriggerRun

| where TimeGenerated > ago(24h)

| where Status == "Failed"

| project TimeGenerated, TriggerName, TriggerType, Status, ErrorCode, ErrorMessage

| order by TimeGenerated desc

Step 4: Configure Alerts

Alert on pipeline failure count (platform metric)

az monitor metrics alert create \

  --name "ADF-PipelineFailures" \

  --resource-group myResourceGroup \

  --scopes "/subscriptions/{sub-id}/resourceGroups/myResourceGroup/providers/Microsoft.DataFactory/factories/myDataFactory" \

  --condition "total PipelineFailedRuns > 0" \

  --window-size 5m \

  --evaluation-frequency 1m \

  --severity 2 \

  --action "/subscriptions/{sub-id}/resourceGroups/myResourceGroup/providers/microsoft.insights/actionGroups/myActionGroup"

Log alert on system errors only (KQL-based)

az monitor scheduled-query create \

  --name "ADF-SystemErrors" \

  --resource-group myResourceGroup \

  --scopes "/subscriptions/{sub-id}/resourceGroups/myResourceGroup/providers/Microsoft.OperationalInsights/workspaces/myworkspace" \

  --condition "count > 0" \

  --condition-query "ADFPipelineRun | where TimeGenerated > ago(5m) | where Status == 'Failed' | where FailureType != 'UserError' | count" \

  --evaluation-frequency "PT5M" \

  --window-size "PT5M" \

  --severity 1 \

  --action-groups "/subscriptions/{sub-id}/resourceGroups/myResourceGroup/providers/microsoft.insights/actionGroups/myActionGroup"

Step 5: Monitor Specific Pipeline Patterns

Monitoring copy activity failures

Copy activities are the most common activity type in ADF and produce additional monitoring data. Enable the session log on Copy activities to capture row-by-row error details for skipped or failed records.

In ADF Studio, the Copy activity monitor view shows rows read, rows written, rows skipped, throughput, and duration. For persistent logging of this data, enable session logging on the Copy activity settings.

Query for copy activity failures with throughput:

ADFActivityRun

| where TimeGenerated > ago(24h)

| where ActivityType == "Copy"

| where Status == "Failed"

| extend output = todynamic(Output)

| project

    TimeGenerated,

    PipelineName,

    ActivityName,

    Status,

    ErrorCode,

    ErrorMessage,

    RowsRead = tolong(output.rowsRead),

    RowsWritten = tolong(output.rowsCopied)

| order by TimeGenerated desc

Monitoring data flow activity failures

Mapping data flows produce their own diagnostic output. Add the DataFlowDebugSessions category to diagnostic settings if you use data flows and need session-level debugging information.

Query for data flow activity duration and errors:

ADFActivityRun

| where TimeGenerated > ago(24h)

| where ActivityType == "ExecuteDataFlow"

| extend durationSec = datetime_diff("second", End, Start)

| project TimeGenerated, PipelineName, ActivityName, Status, durationSec, ErrorCode, ErrorMessage

| order by durationSec desc

Step 6: Rerun from Failure

When a pipeline fails, ADF supports rerunning from the point of failure without re-executing already-completed activities. This is done from the Monitor tab in ADF Studio.

Navigate to the failed pipeline run, click the rerun icon, and select Rerun from failed activity to restart only the activities that failed and their downstream dependencies. Activities that succeeded are not re-executed.

For programmatic reruns, use the REST API or the Azure CLI:

az datafactory pipeline-run cancel \

  --factory-name myDataFactory \

  --resource-group myResourceGroup \

  --run-id {run-id}

Common Setup Problems

ProblemLikely causeFix
No data in ADFPipelineRun tableDiagnostic settings not enabled, or routing set to AzureDiagnostics instead of resource-specificEnable diagnostic settings with resource-specific table routing. Allow 5 to 10 minutes for first data to appear
Queries double-counting run statusesADF writes multiple records per run to the table as the run progressesUse summarize argmax(TimeGenerated, *) by RunId to get the latest status per run
Alerts firing for every missing file or bad dataAlert rule not filtering on FailureTypeAdd where FailureType != “UserError” to alert KQL queries for system error alerting
Pipeline run history missing beyond 45 daysDiagnostic logging was not enabled from the startEnable diagnostic settings and Log Analytics routing. ADF Studio data is capped at 45 days and cannot be recovered retroactively
ADFSandboxActivityRun table not queryableSandbox tables appear in Log Analytics but are not supported for KQL queriesUse ADFActivityRun for all activity run queries. Sandbox tables are not queryable
High Log Analytics ingestion costsAll SSIS categories enabled unnecessarilyEnable only the log categories you need. SSIS categories are high volume and only needed for SSIS workloads

From Pipeline Failures to Application Impact

ADF diagnostic logs tell you a pipeline failed, which activity failed, and what the error code was. What they do not tell you is whether the downstream applications or reports that depend on that pipeline’s output are now broken, which API calls are returning stale data because the pipeline did not complete, or whether the pipeline failure was caused by a problem in an upstream application that writes data to the source ADF reads from.

Distributed tracing in CubeAPM

CubeAPM correlates ADF pipeline telemetry with distributed traces from the application services that depend on pipeline outputs. When a pipeline fails, you can see which downstream API endpoints are now returning errors or stale results, trace the failure back to the upstream service that produced bad source data, and see the full impact chain across your data platform and application tier. It runs self-hosted inside your own infrastructure at $0.15/GB ingestion with no per-user fees.

Summary

Monitoring Azure Data Factory pipeline failures requires two layers working together: ADF Studio’s native Monitor tab for real-time run inspection with 45 days of history, and diagnostic logs in Log Analytics for persistent historical analysis, cross-factory queries, and KQL-based alerting. Enable resource-specific table routing to get typed ADFPipelineRun, ADFActivityRun, and ADFTriggerRun tables. Always filter on FailureType when writing alert rules to distinguish system errors from user-fixable data quality failures. Use argmax when counting run statuses to avoid double-counting from multi-record runs.

SignalWhere it livesKey detail
Pipeline runs (real-time)ADF Studio Monitor tab45-day retention only
Pipeline run countsPlatform metrics: PipelineFailedRuns, PipelineSucceededRunsNo error details, aggregate counts only
Pipeline failure detailsADFPipelineRun table in Log AnalyticsRequires diagnostic settings, resource-specific routing
Activity failure detailsADFActivityRun table in Log AnalyticsIncludes ActivityType, ErrorCode, ErrorMessage
Trigger failuresADFTriggerRun table in Log AnalyticsTrigger name, type, status, error details
FailureType fieldADFPipelineRun and ADFActivityRunUserError vs system error: use to filter alert rules
Duplicate run recordsADFPipelineRun tableUse argmax(TimeGenerated, *) by RunId to get latest status

Disclaimer: 45-day pipeline run retention limit, diagnostic log category names (PipelineRuns, ActivityRuns, TriggerRuns), resource-specific table names (ADFPipelineRun, ADFActivityRun, ADFTriggerRun), FailureType field values, platform metric names, argmax deduplication pattern, Workflow Orchestration Manager retirement (January 1, 2026), and rerun-from-failure capability are verified against Microsoft Learn official documentation including learn.microsoft.com/en-us/azure/data-factory/monitor-data-factory (last modified November 2, 2025), learn.microsoft.com/en-us/azure/data-factory/monitor-configure-diagnostics, and learn.microsoft.com/en-us/azure/data-factory/monitor-data-factory-reference as of May 2026.

Also read:

How to Monitor Azure Functions Execution and Errors 

How to Monitor Azure Virtual Machines: CPU, Memory, and Disk 

How to Monitor Azure SQL Database Performance and Deadlocks 

×
×