CubeAPM
CubeAPM CubeAPM

How to Monitor Azure Event Hubs: Throughput and Consumer Lag

How to Monitor Azure Event Hubs: Throughput and Consumer Lag

Table of Contents

Azure Event Hubs is a high-throughput, fully managed event streaming platform capable of ingesting millions of events per second. It serves as the backbone for real-time analytics pipelines, IoT telemetry, fraud detection, application logs, and clickstream analysis. When Event Hubs works well, it is invisible. When something goes wrong, the impact cascades quickly into downstream systems.

This guide explains how to monitor Azure Event Hubs for the two metrics that matter most in production: throughput and consumer lag. You will learn how to set up alerts, write KQL queries, and troubleshoot common failure patterns.

Key Takeaways

  • Consumer lag measures how far behind consumers are from the latest event in each partition.
  • Azure Monitor provides built-in throughput metrics (IncomingBytes, OutgoingBytes, ThrottledRequests) for all tiers.
  • Consumer lag as a native metric is available only on Premium and Dedicated tiers via Application Metrics Logs.
  • On Basic and Standard tiers, calculate consumer lag by comparing checkpoint sequence numbers with last-enqueued sequence numbers.
  • Each throughput unit provides 1 MB/s ingress and 2 MB/s egress. Exceeding limits causes ThrottledRequests errors.
  • Alerts on consumer lag, ThrottledRequests, and ServerErrors are the minimum viable monitoring baseline.
  • Partition count determines maximum consumer parallelism and cannot be decreased after creation.

What Is Azure Event Hubs Monitoring?

Azure Event Hubs monitoring means continuously observing the health and performance of your namespaces, event hubs, and consumer groups. It covers three layers:

  • Infrastructure: Throughput units, bytes in/out, throttled requests, server errors.
  • Consumer: Lag per partition, checkpoint freshness, active connections.
  • Application: End-to-end event latency, processing errors, downstream service health.

Azure Monitor is the primary platform for collecting Event Hubs metrics. Platform metrics are collected automatically with no configuration required. Resource logs, runtime audit logs, and application metrics logs require diagnostic settings to be enabled and routed to a destination.

Key Azure Event Hubs Metrics to Monitor

Throughput and Ingestion Metrics

MetricREST API NamePurposeAlert On
Incoming BytesIncomingBytesData volume ingested vs TU limit> 90% of TU ingress
Outgoing BytesOutgoingBytesData volume consumed by readers> 90% of TU egress
Incoming MessagesIncomingMessagesEvent count from producersSudden drop to 0
Outgoing MessagesOutgoingMessagesEvent count delivered to consumersPersistently < Incoming
Throttled RequestsThrottledRequestsRequests rejected due to TU breach> 0 for 5+ minutes
Incoming RequestsIncomingRequestsTotal send attempts from producersSpikes paired with errors

Health and Error Metrics

MetricREST API NameWhat It Signals
Server ErrorsServerErrorsInternal Event Hubs errors. Sustained values indicate a service incident.
User ErrorsUserErrorsClient errors: bad auth, malformed requests.
Successful RequestsSuccessfulRequestsConfirmed successful operations. Use with errors to track success rate.
Active ConnectionsActiveConnectionsCurrent connections. A drop to 0 signals a consumer group disconnect.

Cluster-Level Metrics (Dedicated Tier)

MetricWhat It SignalsAlert Threshold
CPUCluster CPU utilization> 80%
Available MemoryMemory available as % of total< 20%
Cluster UtilizationAggregated utilization across all resources> 75% sustained

Understanding Consumer Lag in Azure Event Hubs

What Is Consumer Lag?

Consumer lag is the difference between the sequence number of the most recently enqueued event on a partition and the sequence number of the last checkpointed event for a given consumer group. A lag of 0 means your consumer is keeping up in real time. A growing lag is an early warning signal of slow processing, checkpoint failures, partition rebalancing, or throughput throttling.

Why Consumer Lag Is Not Built-In on All Tiers

On Basic and Standard tiers, consumer lag is not a native Azure Monitor metric. It must be calculated by comparing two values:

  1. The last enqueued sequence number per partition (available via the SDK getPartitionProperties).
  2. The checkpoint sequence number stored in Blob Storage for that consumer group.

On Premium and Dedicated tiers, consumer lag is available natively via Application Metrics Logs, enabled through Diagnostic Settings.

How to Enable Azure Event Hubs Monitoring

Step 1: Enable Diagnostic Settings

  • Go to your Event Hubs namespace in the Azure portal.
  • Under Monitoring, click Diagnostic settings, then Add diagnostic setting.
  • Select log categories: OperationalLogs (all tiers), RuntimeAuditLogs, and ApplicationMetricsLogs (Premium/Dedicated only).
  • Choose a destination: Log Analytics workspace (recommended), Azure Storage, or Event Hubs.
  • Save. Metrics begin appearing within 15 minutes.

Step 2: Access Metrics in Azure Monitor

  • Navigate to your namespace and click Metrics under Monitoring.
  • Use Metrics Explorer to select a metric, apply a time range, and filter by dimensions such as EntityName or OperationResult.
  • Metrics data is retained for 90 days. The chart view supports up to 30 days per render.

Step 3: Query Logs with KQL

Consumer lag from Application Metrics Logs (Premium/Dedicated):

AzureDiagnostics| where ActivityName_s == "ConsumerLag"| project ConsumerGroup = ChildEntityName_s, EventHub = EntityName_s,          PartitionId = PartitionId_s, Lag = Count_d, Timestamp = eventTimestamp_s| order by Timestamp desc

Throttled requests over the last hour:

AzureMetrics| where ResourceProvider == "MICROSOFT.EVENTHUB"| where MetricName == "ThrottledRequests"| where TimeGenerated > ago(1h)| summarize TotalThrottled = sum(Total) by bin(TimeGenerated, 5m)| order by TimeGenerated desc

Incoming vs outgoing messages (indirect lag indicator):

AzureMetrics| where ResourceProvider == "MICROSOFT.EVENTHUB"| where MetricName in ("IncomingMessages", "OutgoingMessages")| summarize Total = sum(Total) by MetricName, bin(TimeGenerated, 5m)| render timechart

Monitoring Consumer Lag on Basic and Standard Tiers

For Basic and Standard tiers, deploy a lightweight monitoring sidecar (for example, an Azure Container App) that runs independently from your consumer application. If the consumer crashes, the sidecar must continue reporting so alerts still fire.

Calculating Consumer Lag (TypeScript)

export async function measureConsumerLag(  consumerGroup: string,  eventHubClient: EventHubConsumerClient,  checkpointStore: BlobCheckpointStore): Promise<void> {  const partitionIds = await eventHubClient.getPartitionIds();  const checkpoints = await checkpointStore.listCheckpoints(    eventHubClient.fullyQualifiedNamespace,    eventHubClient.eventHubName, consumerGroup  );  const seqByPartition = Object.fromEntries(    checkpoints.map(({ partitionId, sequenceNumber }) => [partitionId, sequenceNumber])  );  await Promise.all(partitionIds.map(async partitionId => {    const lastKnown = seqByPartition[partitionId] ?? 0;    const { lastEnqueuedSequenceNumber } =      await eventHubClient.getPartitionProperties(partitionId);    const consumerLag = lastEnqueuedSequenceNumber - lastKnown;    // Emit consumerLag to Application Insights or Azure Monitor custom metrics  }));}

Monitoring Consumer Lag on Premium and Dedicated Tiers

On Premium and Dedicated tiers, enable ApplicationMetricsLogs through Diagnostic Settings and route them to a Log Analytics workspace. The ConsumerLag activity appears in AzureDiagnostics and can be queried and alerted on directly.

Enable Application Metrics Logs via Bicep

resource eventHubs 'Microsoft.EventHub/namespaces@2021-06-01-preview' = {  name: 'myeventhubs'  location: location  sku: { name: 'Premium', tier: 'Premium', capacity: 1 }  resource logSettings 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {    name: 'diagnostics'    scope: eventHubs    properties: {      logs: [{ category: 'ApplicationMetricsLogs', enabled: true }]      workspaceId: logAnalyticsWorkspace.id    }  }}

Consumer Lag Alert Rule via Bicep

resource lagAlert 'microsoft.insights/scheduledqueryrules@2022-08-01-preview' = {  name: alertRuleName  location: location  properties: {    severity: 3    enabled: true    evaluationFrequency: 'PT5M'    scopes: [ logAnalyticsWorkspace.id ]    windowSize: 'PT5M'    criteria: { allOf: [{      query: 'AzureDiagnostics | where ActivityName_s == \'ConsumerLag\' | project ConsumerGroup = ChildEntityName_s, EventHub = EntityName_s, PartitionId = PartitionId_s, Lag = Count_d'      timeAggregation: 'Maximum'      metricMeasureColumn: 'Lag'      operator: 'GreaterThan'      threshold: 100    }]}  }}

Monitoring Azure Event Hubs Throughput

Throughput Unit Limits

Each Standard-tier Throughput Unit (TU) provides 1 MB/s ingress (or 1,000 events/s) and 2 MB/s egress. Exceeding ingress capacity throws an EventHubsException with a ServiceBusy reason and increments ThrottledRequests. Egress is silently capped without errors.

Check Capacity and Enable Auto-Inflate

# Check TU configurationaz eventhubs namespace show \  --resource-group myResourceGroup \  --name myEventHubNamespace \  --query "{sku:sku.name, capacity:sku.capacity, autoInflate:isAutoInflateEnabled, maxTUs:maximumThroughputUnits}" \  -o json
# Enable auto-inflate (scales TUs automatically)az eventhubs namespace update \  --resource-group myResourceGroup \  --name myEventHubNamespace \  --enable-auto-inflate true \  --maximum-throughput-units 20

Setting Up Alerts for Azure Event Hubs

Recommended Alert Baseline

AlertMetricConditionWhy It Matters
Throughput throttlingThrottledRequests> 0 for 5 minProducers being rejected; data loss risk.
Service errorsServerErrors> 5 in 5 minInternal Event Hubs service issue.
Consumer lagConsumerLag (Log)> your SLA thresholdProcessing is falling behind ingestion.
Producer dropIncomingMessages> 50% drop vs 1h avgProducer failure or network partition.
Connection lossActiveConnectionsDrop to 0Consumer group fully disconnected.

Creating an Alert in the Azure Portal

  • Go to your Event Hubs namespace and click Alerts under Monitoring.
  • Click Create > Alert rule, then Add condition and select the metric.
  • Set the threshold, aggregation (Sum), and evaluation frequency (5 minutes).
  • Under Actions, add an action group to route to email, SMS, or webhook.
  • Save the alert rule.

Troubleshooting Common Azure Event Hubs Issues

Consumer Lag Keeps Growing

  • Slow processing: Profile event handling code. Switch from per-event synchronous writes to batch async writes to reduce latency.
  • Too few partitions: Maximum parallel consumers per consumer group equals partition count. Increase partitions on Premium/Dedicated tiers.
  • TU throttling: ThrottledRequests > 0 with growing lag means ingress is being rejected. Enable auto-inflate or upgrade the tier.
  • Checkpoint failures: If the consumer cannot write checkpoints to Blob Storage, it reprocesses events on restart. Check storage account firewall rules.

Messages Not Reaching Consumers

  • Confirm the consumer is using the correct consumer group. Each group has an independent offset.
  • Verify the namespace name, event hub name, and connection string are correct.
  • Check that ports 5671 (AMQP) and 443 (HTTPS) are open from the consumer to the Event Hubs endpoint.
  • Look for UserErrors in Azure Monitor, which indicate auth or authorization failures.

Checkpoint Failures

CauseFix
Storage account firewall blocking accessVerify: az storage account show –name store –query “networkRuleSet.defaultAction”. Add consumer IP/VNet to allowed list.
Checkpointing after every single eventCheckpoint after each batch, not each message. Reduces Blob Storage request rate significantly.
Partition ownership conflicts on rebalanceExpected during scaling. EventProcessorClient resolves automatically. Monitor logs for repeated conflicts.

Custom Metrics for Consumer Lag (Basic and Standard)

Deploy a containerized monitoring application that iterates over consumer groups and partitions, calculates lag, and emits it to Azure Monitor as a custom metric. Use Azure Managed Identity to authenticate against Event Hubs, Blob Storage, and the Azure Monitor Metrics REST API without storing credentials.

For a full reference implementation with Bicep deployment scripts, see the  repository.

 

Monitor Azure Event Hubs Smarter with CubeAPM

Tracking consumer lag, throttled requests, and partition health across multiple namespaces is hard. CubeAPM provides unified observability for distributed systems with native Azure Event Hubs support.

  • Visualize consumer lag per partition and consumer group in real time.
  • Alert on ThrottledRequests and IncomingBytes breaches before users are affected.
  • Correlate Event Hub metrics with your application traces end-to-end.
  • Monitor hybrid and multi-cloud event pipelines from a single dashboard.
Get Started with CubeAPM Today

Conclusion

Effective Azure Event Hubs monitoring starts with two metrics: throughput and consumer lag. Throughput metrics are available out of the box for all tiers. Consumer lag requires extra setup, natively on Premium and Dedicated tiers via Application Metrics Logs, and via a custom sidecar on Basic and Standard tiers.

Set up diagnostic settings, create alerts for ThrottledRequests and ServerErrors, and instrument consumer lag tracking from day one. Catching these signals early prevents consumer backlogs and data loss from becoming user-visible incidents.

Disclaimer: Azure Event Hubs features, metrics, and pricing tiers are subject to change. The information in this article is based on documentation available as of May 2026 and may not reflect the latest updates from Microsoft. Always refer to the official Microsoft Azure documentation for the most current guidance.

FAQs

Q1. Does Azure Event Hubs provide consumer lag as a built-in metric?

Not for all tiers. Consumer lag is available natively only on Premium and Dedicated tiers through Application Metrics Logs, enabled via Diagnostic Settings. On Basic and Standard tiers, you must calculate lag manually by comparing the last-enqueued sequence number with the last checkpoint sequence number, then publish the result as a custom metric to Azure Monitor.

Q2. What is the difference between IncomingMessages and OutgoingMessages in Azure Event Hubs?

IncomingMessages counts events published by producers to the event hub. OutgoingMessages counts events delivered to consumers. If IncomingMessages consistently exceeds OutgoingMessages over time, consumers are falling behind and lag is building. This comparison is a useful indirect lag indicator on tiers where ApplicationMetricsLogs are unavailable.

Q3. What causes ThrottledRequests in Azure Event Hubs?

ThrottledRequests occur when your namespace exceeds the throughput capacity of its provisioned Throughput Units (Standard tier) or Processing Units (Premium tier). Each Standard TU supports 1 MB/s ingress and 2 MB/s egress. When ingress exceeds this limit, Event Hubs rejects incoming requests with an EventHubsException (ServiceBusy). Enable auto-inflate on the Standard tier to scale TUs automatically, or upgrade to Premium for higher sustained capacity.

Q4. How often should I checkpoint in Azure Event Hubs?

Checkpoint after processing each batch of events, not after every individual message. Checkpointing too frequently sends excessive write requests to Azure Blob Storage, which can trigger storage throttling and slow down your consumer. Checkpointing per batch also reduces the number of events that are reprocessed if a consumer restarts, because the last checkpoint is always at a batch boundary rather than mid-stream.

Q5. How many consumer groups should I create per event hub?

Create one dedicated consumer group per consuming application. Never share the $Default consumer group between multiple applications, as they will compete for partition ownership and one application will be starved of events. Each consumer group maintains its own independent offset in the partition, so separate consumer groups let multiple applications read the same stream independently without interfering with each other.

×
×