Azure Event Hubs is a high-throughput, fully managed event streaming platform capable of ingesting millions of events per second. It serves as the backbone for real-time analytics pipelines, IoT telemetry, fraud detection, application logs, and clickstream analysis. When Event Hubs works well, it is invisible. When something goes wrong, the impact cascades quickly into downstream systems.
This guide explains how to monitor Azure Event Hubs for the two metrics that matter most in production: throughput and consumer lag. You will learn how to set up alerts, write KQL queries, and troubleshoot common failure patterns.
Key Takeaways
- Consumer lag measures how far behind consumers are from the latest event in each partition.
- Azure Monitor provides built-in throughput metrics (IncomingBytes, OutgoingBytes, ThrottledRequests) for all tiers.
- Consumer lag as a native metric is available only on Premium and Dedicated tiers via Application Metrics Logs.
- On Basic and Standard tiers, calculate consumer lag by comparing checkpoint sequence numbers with last-enqueued sequence numbers.
- Each throughput unit provides 1 MB/s ingress and 2 MB/s egress. Exceeding limits causes ThrottledRequests errors.
- Alerts on consumer lag, ThrottledRequests, and ServerErrors are the minimum viable monitoring baseline.
- Partition count determines maximum consumer parallelism and cannot be decreased after creation.
What Is Azure Event Hubs Monitoring?
Azure Event Hubs monitoring means continuously observing the health and performance of your namespaces, event hubs, and consumer groups. It covers three layers:
- Infrastructure: Throughput units, bytes in/out, throttled requests, server errors.
- Consumer: Lag per partition, checkpoint freshness, active connections.
- Application: End-to-end event latency, processing errors, downstream service health.
Azure Monitor is the primary platform for collecting Event Hubs metrics. Platform metrics are collected automatically with no configuration required. Resource logs, runtime audit logs, and application metrics logs require diagnostic settings to be enabled and routed to a destination.
Key Azure Event Hubs Metrics to Monitor
Throughput and Ingestion Metrics
| Metric | REST API Name | Purpose | Alert On |
| Incoming Bytes | IncomingBytes | Data volume ingested vs TU limit | > 90% of TU ingress |
| Outgoing Bytes | OutgoingBytes | Data volume consumed by readers | > 90% of TU egress |
| Incoming Messages | IncomingMessages | Event count from producers | Sudden drop to 0 |
| Outgoing Messages | OutgoingMessages | Event count delivered to consumers | Persistently < Incoming |
| Throttled Requests | ThrottledRequests | Requests rejected due to TU breach | > 0 for 5+ minutes |
| Incoming Requests | IncomingRequests | Total send attempts from producers | Spikes paired with errors |
Health and Error Metrics
| Metric | REST API Name | What It Signals |
| Server Errors | ServerErrors | Internal Event Hubs errors. Sustained values indicate a service incident. |
| User Errors | UserErrors | Client errors: bad auth, malformed requests. |
| Successful Requests | SuccessfulRequests | Confirmed successful operations. Use with errors to track success rate. |
| Active Connections | ActiveConnections | Current connections. A drop to 0 signals a consumer group disconnect. |
Cluster-Level Metrics (Dedicated Tier)
| Metric | What It Signals | Alert Threshold |
| CPU | Cluster CPU utilization | > 80% |
| Available Memory | Memory available as % of total | < 20% |
| Cluster Utilization | Aggregated utilization across all resources | > 75% sustained |
Understanding Consumer Lag in Azure Event Hubs
What Is Consumer Lag?
Consumer lag is the difference between the sequence number of the most recently enqueued event on a partition and the sequence number of the last checkpointed event for a given consumer group. A lag of 0 means your consumer is keeping up in real time. A growing lag is an early warning signal of slow processing, checkpoint failures, partition rebalancing, or throughput throttling.
Why Consumer Lag Is Not Built-In on All Tiers
On Basic and Standard tiers, consumer lag is not a native Azure Monitor metric. It must be calculated by comparing two values:
- The last enqueued sequence number per partition (available via the SDK getPartitionProperties).
- The checkpoint sequence number stored in Blob Storage for that consumer group.
On Premium and Dedicated tiers, consumer lag is available natively via Application Metrics Logs, enabled through Diagnostic Settings.
How to Enable Azure Event Hubs Monitoring
Step 1: Enable Diagnostic Settings
- Go to your Event Hubs namespace in the Azure portal.
- Under Monitoring, click Diagnostic settings, then Add diagnostic setting.
- Select log categories: OperationalLogs (all tiers), RuntimeAuditLogs, and ApplicationMetricsLogs (Premium/Dedicated only).
- Choose a destination: Log Analytics workspace (recommended), Azure Storage, or Event Hubs.
- Save. Metrics begin appearing within 15 minutes.
Step 2: Access Metrics in Azure Monitor
- Navigate to your namespace and click Metrics under Monitoring.
- Use Metrics Explorer to select a metric, apply a time range, and filter by dimensions such as EntityName or OperationResult.
- Metrics data is retained for 90 days. The chart view supports up to 30 days per render.
Step 3: Query Logs with KQL
Consumer lag from Application Metrics Logs (Premium/Dedicated):
AzureDiagnostics| where ActivityName_s == "ConsumerLag"| project ConsumerGroup = ChildEntityName_s, EventHub = EntityName_s, PartitionId = PartitionId_s, Lag = Count_d, Timestamp = eventTimestamp_s| order by Timestamp descThrottled requests over the last hour:
AzureMetrics| where ResourceProvider == "MICROSOFT.EVENTHUB"| where MetricName == "ThrottledRequests"| where TimeGenerated > ago(1h)| summarize TotalThrottled = sum(Total) by bin(TimeGenerated, 5m)| order by TimeGenerated descIncoming vs outgoing messages (indirect lag indicator):
AzureMetrics| where ResourceProvider == "MICROSOFT.EVENTHUB"| where MetricName in ("IncomingMessages", "OutgoingMessages")| summarize Total = sum(Total) by MetricName, bin(TimeGenerated, 5m)| render timechartMonitoring Consumer Lag on Basic and Standard Tiers
For Basic and Standard tiers, deploy a lightweight monitoring sidecar (for example, an Azure Container App) that runs independently from your consumer application. If the consumer crashes, the sidecar must continue reporting so alerts still fire.
Calculating Consumer Lag (TypeScript)
export async function measureConsumerLag( consumerGroup: string, eventHubClient: EventHubConsumerClient, checkpointStore: BlobCheckpointStore): Promise<void> { const partitionIds = await eventHubClient.getPartitionIds(); const checkpoints = await checkpointStore.listCheckpoints( eventHubClient.fullyQualifiedNamespace, eventHubClient.eventHubName, consumerGroup ); const seqByPartition = Object.fromEntries( checkpoints.map(({ partitionId, sequenceNumber }) => [partitionId, sequenceNumber]) ); await Promise.all(partitionIds.map(async partitionId => { const lastKnown = seqByPartition[partitionId] ?? 0; const { lastEnqueuedSequenceNumber } = await eventHubClient.getPartitionProperties(partitionId); const consumerLag = lastEnqueuedSequenceNumber - lastKnown; // Emit consumerLag to Application Insights or Azure Monitor custom metrics }));}Monitoring Consumer Lag on Premium and Dedicated Tiers
On Premium and Dedicated tiers, enable ApplicationMetricsLogs through Diagnostic Settings and route them to a Log Analytics workspace. The ConsumerLag activity appears in AzureDiagnostics and can be queried and alerted on directly.
Enable Application Metrics Logs via Bicep
resource eventHubs 'Microsoft.EventHub/namespaces@2021-06-01-preview' = { name: 'myeventhubs' location: location sku: { name: 'Premium', tier: 'Premium', capacity: 1 } resource logSettings 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = { name: 'diagnostics' scope: eventHubs properties: { logs: [{ category: 'ApplicationMetricsLogs', enabled: true }] workspaceId: logAnalyticsWorkspace.id } }}Consumer Lag Alert Rule via Bicep
resource lagAlert 'microsoft.insights/scheduledqueryrules@2022-08-01-preview' = { name: alertRuleName location: location properties: { severity: 3 enabled: true evaluationFrequency: 'PT5M' scopes: [ logAnalyticsWorkspace.id ] windowSize: 'PT5M' criteria: { allOf: [{ query: 'AzureDiagnostics | where ActivityName_s == \'ConsumerLag\' | project ConsumerGroup = ChildEntityName_s, EventHub = EntityName_s, PartitionId = PartitionId_s, Lag = Count_d' timeAggregation: 'Maximum' metricMeasureColumn: 'Lag' operator: 'GreaterThan' threshold: 100 }]} }}Monitoring Azure Event Hubs Throughput
Throughput Unit Limits
Each Standard-tier Throughput Unit (TU) provides 1 MB/s ingress (or 1,000 events/s) and 2 MB/s egress. Exceeding ingress capacity throws an EventHubsException with a ServiceBusy reason and increments ThrottledRequests. Egress is silently capped without errors.
Check Capacity and Enable Auto-Inflate
# Check TU configurationaz eventhubs namespace show \ --resource-group myResourceGroup \ --name myEventHubNamespace \ --query "{sku:sku.name, capacity:sku.capacity, autoInflate:isAutoInflateEnabled, maxTUs:maximumThroughputUnits}" \ -o json
# Enable auto-inflate (scales TUs automatically)az eventhubs namespace update \ --resource-group myResourceGroup \ --name myEventHubNamespace \ --enable-auto-inflate true \ --maximum-throughput-units 20Setting Up Alerts for Azure Event Hubs
Recommended Alert Baseline
| Alert | Metric | Condition | Why It Matters |
| Throughput throttling | ThrottledRequests | > 0 for 5 min | Producers being rejected; data loss risk. |
| Service errors | ServerErrors | > 5 in 5 min | Internal Event Hubs service issue. |
| Consumer lag | ConsumerLag (Log) | > your SLA threshold | Processing is falling behind ingestion. |
| Producer drop | IncomingMessages | > 50% drop vs 1h avg | Producer failure or network partition. |
| Connection loss | ActiveConnections | Drop to 0 | Consumer group fully disconnected. |
Creating an Alert in the Azure Portal
- Go to your Event Hubs namespace and click Alerts under Monitoring.
- Click Create > Alert rule, then Add condition and select the metric.
- Set the threshold, aggregation (Sum), and evaluation frequency (5 minutes).
- Under Actions, add an action group to route to email, SMS, or webhook.
- Save the alert rule.
Troubleshooting Common Azure Event Hubs Issues
Consumer Lag Keeps Growing
- Slow processing: Profile event handling code. Switch from per-event synchronous writes to batch async writes to reduce latency.
- Too few partitions: Maximum parallel consumers per consumer group equals partition count. Increase partitions on Premium/Dedicated tiers.
- TU throttling: ThrottledRequests > 0 with growing lag means ingress is being rejected. Enable auto-inflate or upgrade the tier.
- Checkpoint failures: If the consumer cannot write checkpoints to Blob Storage, it reprocesses events on restart. Check storage account firewall rules.
Messages Not Reaching Consumers
- Confirm the consumer is using the correct consumer group. Each group has an independent offset.
- Verify the namespace name, event hub name, and connection string are correct.
- Check that ports 5671 (AMQP) and 443 (HTTPS) are open from the consumer to the Event Hubs endpoint.
- Look for UserErrors in Azure Monitor, which indicate auth or authorization failures.
Checkpoint Failures
| Cause | Fix |
| Storage account firewall blocking access | Verify: az storage account show –name store –query “networkRuleSet.defaultAction”. Add consumer IP/VNet to allowed list. |
| Checkpointing after every single event | Checkpoint after each batch, not each message. Reduces Blob Storage request rate significantly. |
| Partition ownership conflicts on rebalance | Expected during scaling. EventProcessorClient resolves automatically. Monitor logs for repeated conflicts. |
Custom Metrics for Consumer Lag (Basic and Standard)
Deploy a containerized monitoring application that iterates over consumer groups and partitions, calculates lag, and emits it to Azure Monitor as a custom metric. Use Azure Managed Identity to authenticate against Event Hubs, Blob Storage, and the Azure Monitor Metrics REST API without storing credentials.
For a full reference implementation with Bicep deployment scripts, see the repository.
Monitor Azure Event Hubs Smarter with CubeAPM
Tracking consumer lag, throttled requests, and partition health across multiple namespaces is hard. CubeAPM provides unified observability for distributed systems with native Azure Event Hubs support.
- Visualize consumer lag per partition and consumer group in real time.
- Alert on ThrottledRequests and IncomingBytes breaches before users are affected.
- Correlate Event Hub metrics with your application traces end-to-end.
- Monitor hybrid and multi-cloud event pipelines from a single dashboard.
Conclusion
Effective Azure Event Hubs monitoring starts with two metrics: throughput and consumer lag. Throughput metrics are available out of the box for all tiers. Consumer lag requires extra setup, natively on Premium and Dedicated tiers via Application Metrics Logs, and via a custom sidecar on Basic and Standard tiers.
Set up diagnostic settings, create alerts for ThrottledRequests and ServerErrors, and instrument consumer lag tracking from day one. Catching these signals early prevents consumer backlogs and data loss from becoming user-visible incidents.
FAQs
Q1. Does Azure Event Hubs provide consumer lag as a built-in metric?
Not for all tiers. Consumer lag is available natively only on Premium and Dedicated tiers through Application Metrics Logs, enabled via Diagnostic Settings. On Basic and Standard tiers, you must calculate lag manually by comparing the last-enqueued sequence number with the last checkpoint sequence number, then publish the result as a custom metric to Azure Monitor.
Q2. What is the difference between IncomingMessages and OutgoingMessages in Azure Event Hubs?
IncomingMessages counts events published by producers to the event hub. OutgoingMessages counts events delivered to consumers. If IncomingMessages consistently exceeds OutgoingMessages over time, consumers are falling behind and lag is building. This comparison is a useful indirect lag indicator on tiers where ApplicationMetricsLogs are unavailable.
Q3. What causes ThrottledRequests in Azure Event Hubs?
ThrottledRequests occur when your namespace exceeds the throughput capacity of its provisioned Throughput Units (Standard tier) or Processing Units (Premium tier). Each Standard TU supports 1 MB/s ingress and 2 MB/s egress. When ingress exceeds this limit, Event Hubs rejects incoming requests with an EventHubsException (ServiceBusy). Enable auto-inflate on the Standard tier to scale TUs automatically, or upgrade to Premium for higher sustained capacity.
Q4. How often should I checkpoint in Azure Event Hubs?
Checkpoint after processing each batch of events, not after every individual message. Checkpointing too frequently sends excessive write requests to Azure Blob Storage, which can trigger storage throttling and slow down your consumer. Checkpointing per batch also reduces the number of events that are reprocessed if a consumer restarts, because the last checkpoint is always at a batch boundary rather than mid-stream.
Q5. How many consumer groups should I create per event hub?
Create one dedicated consumer group per consuming application. Never share the $Default consumer group between multiple applications, as they will compete for partition ownership and one application will be starved of events. Each consumer group maintains its own independent offset in the partition, so separate consumer groups let multiple applications read the same stream independently without interfering with each other.





