CubeAPM
CubeAPM CubeAPM

How Do I Monitor AWS Kinesis Stream Lag and Throughput?

How Do I Monitor AWS Kinesis Stream Lag and Throughput?

Table of Contents

Kinesis Data Streams publishes two levels of metrics to CloudWatch: stream-level metrics (always available, free, published every minute) and shard-level metrics (opt-in, billed per metric per shard, useful for hot shard diagnosis). Monitoring stream lag and throughput requires tracking both what goes into the stream and whether your consumers are keeping up with it.

The single most important metric for consumer lag is GetRecords.IteratorAgeMilliseconds. The single most important metric for producer throughput is WriteProvisionedThroughputExceeded. Everything else provides supporting context. Let’s understand how to monitor AWS Kinesis Stream lags and throughput.

Key Takeaways

  • Always use the Maximum statistic for GetRecords.IteratorAgeMilliseconds – using Average can hide a single lagging shard that puts your stream at risk of data loss
  • If GetRecords.IteratorAgeMilliseconds reaches 100% of your stream’s retention period, records begin expiring before they are consumed – data is permanently lost
  • AWS recommends alerting when the iterator age exceeds 50% of your stream retention period
  • WriteProvisionedThroughputExceeded should be zero in normal operation – any non-zero value means records are being throttled and your producers must handle retries
  • Shard-level metrics are not free – each shard-level metric creates approximately 43,200 CloudWatch PutMetricData calls per month per shard. Enable them selectively for troubleshooting, not universally
  • For standard GetRecords consumers, each shard is limited to 5 read transactions per second and 2 MB/second of read throughput, shared across all consumers on that shard. Use enhanced fan-out (SubscribeToShard) when you need more than one consuming application

Stream Lag: GetRecords.IteratorAgeMilliseconds

What it is: The age in milliseconds of the last record returned by all GetRecords calls on the stream. It measures the gap between the timestamp of the most recently read record and the current time. A value of zero means consumers are fully caught up. A growing value means consumers are falling behind.

Why it is the primary lag metric: Iterator age directly answers the question: how old is the newest record my consumer has seen? If this value is growing, your consumers cannot keep pace with your producers.

What good looks like: Near zero in steady state. Some lag is expected during traffic spikes, as long as it recovers.

What bad looks like: A value that keeps climbing without recovering. At 50% of your retention period, start scaling consumers immediately. At 100%, records are expiring before they are consumed – data loss is occurring.

Retention period reference:

Retention setting50% alert threshold100% data loss threshold
24 hours (default)12 hours (43,200,000 ms)24 hours (86,400,000 ms)
7 days3.5 days (302,400,000 ms)7 days
365 days182.5 days365 days

Alert threshold to set: AWS recommends alerting when the iterator age exceeds 50% of your retention period. For a stream with default 24-hour retention, that is 43,200,000 milliseconds.

# Warning: iterator age exceeds 5 minutes (consumer falling behind)

aws cloudwatch put-metric-alarm \

  --alarm-name "kinesis-iterator-age-warning" \

  --metric-name GetRecords.IteratorAgeMilliseconds \

  --namespace AWS/Kinesis \

  --dimensions Name=StreamName,Value=your-stream-name \

  --statistic Maximum \

  --period 60 \

  --evaluation-periods 3 \

  --threshold 300000 \

  --comparison-operator GreaterThanThreshold \

  --alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic

# Critical: iterator age exceeds 1 hour (approaching retention risk on default 24h retention)

aws cloudwatch put-metric-alarm \

  --alarm-name "kinesis-iterator-age-critical" \

  --metric-name GetRecords.IteratorAgeMilliseconds \

  --namespace AWS/Kinesis \

  --dimensions Name=StreamName,Value=your-stream-name \

  --statistic Maximum \

  --period 60 \

  --evaluation-periods 2 \

  --threshold 3600000 \

  --comparison-operator GreaterThanThreshold \

  --alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic

Always use Maximum, not Average. The Maximum statistic catches the single worst-lagging consumer or shard. The Average can look healthy even when one shard is hours behind. AWS re:Post explicitly recommends Maximum for this metric.

When the iterator age increases – common causes:

  • Consumer processing logic is too slow for the incoming record rate
  • Insufficient GetRecords call frequency (consumer not polling fast enough)
  • Hot shards: partition key is not distributing records evenly, so some shards receive far more records than others
  • Lambda consumer timeout or memory issues blocking processing
  • Connection timeout issues causing intermittent GetRecords failures

Write Throughput: IncomingBytes and IncomingRecords

What they are: IncomingBytes and IncomingRecords measure the total data volume and record count being written to the stream per period. These are your producer throughput metrics.

What good looks like: Consistent with your expected ingestion pattern. Use Sum over 1-minute periods for accurate throughput measurement.

What bad looks like: A sudden drop to zero or near zero, which means producers have stopped or failed. A spike far above baseline may indicate a retry storm from upstream failures.

Shard capacity reference:

  • Each shard supports 1 MB/second of write throughput
  • Each shard supports 1,000 records/second of write throughput
  • A stream with 4 shards supports 4 MB/second and 4,000 records/second

To calculate current utilization per shard, divide IncomingBytes Sum (per minute) by 60 seconds and by your shard count.

Alert threshold to set:

  • Alert when IncomingBytes drops to 0 during expected ingestion windows – producers have stopped
  • Enable shard-level IncomingBytes to detect hot shards when aggregate throughput looks healthy but latency is high

Write Throttling: WriteProvisionedThroughputExceeded

What it is: The number of records rejected because the write throughput limit for a shard was exceeded. Any non-zero value means your producers are being throttled.

What good looks like: Zero. Always.

What bad looks like: Any sustained non-zero value. When a PutRecord or PutRecords call is throttled, the producer must retry. If your producers do not implement exponential backoff with jitter on throttle responses, you will see retry storms that compound the throttling.

Alert threshold to set:

  • Alert: WriteProvisionedThroughputExceeded Sum > 0 over a 5-minute period
aws cloudwatch put-metric-alarm \

  --alarm-name "kinesis-write-throttle" \

  --metric-name WriteProvisionedThroughputExceeded \

  --namespace AWS/Kinesis \

  --dimensions Name=StreamName,Value=your-stream-name \

  --statistic Sum \

  --period 300 \

  --evaluation-periods 1 \

  --threshold 0 \

  --comparison-operator GreaterThanThreshold \

  --alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic

What to do when it fires:

  • Check if a single shard is receiving disproportionate write load (hot shard – requires shard-level metrics)
  • Review the partition key distribution in your producers – sequential or low-cardinality keys cause hot shards
  • Add shards to increase write capacity (UpdateShardCount)
  • Switch from PutRecord to PutRecords for better batching efficiency

Read Throttling: ReadProvisionedThroughputExceeded

What it is: The number of GetRecords calls throttled because read throughput limits were exceeded. Each shard supports 5 GetRecords transactions per second and 2 MB/second of read throughput, shared across all standard consumers.

What good looks like: Zero. Any non-zero value means consumers are competing for the same shard’s read bandwidth.

What bad looks like: Sustained read throttling, particularly when you have multiple consumer applications reading the same stream with GetRecords.

Alert threshold to set:

  • Alert: ReadProvisionedThroughputExceeded Sum > 0 over a 5-minute period

The enhanced fan-out solution: If you have more than one consuming application, standard GetRecords forces them to share the 2 MB/second/shard read throughput. Enhanced fan-out (via SubscribeToShard) gives each registered consumer its own dedicated 2 MB/second/shard pipe. Data is pushed to consumers via HTTP/2 within approximately 70 milliseconds of being written – approximately 65% faster delivery than polling with GetRecords.

Enhanced fan-out consumer limits:

  • On-demand Standard and Provisioned: up to 20 registered consumers per stream
  • On-demand Advantage (as of November 2025): up to 50 registered consumers per stream

Enhanced fan-out is billed separately at $0.015 per consumer-shard-hour plus $0.013 per GB retrieved (us-east-1 pricing – verify current pricing).

GetRecords Success Rate and Latency

GetRecords.Success: The count of successful GetRecords calls. A drop in success rate, combined with rising iterator age, confirms consumer read failures. Use Average statistic.

GetRecords.Latency: The time in milliseconds taken per GetRecords operation. This can increase when there are more records to fetch per call (larger batches), or when network issues or server-side encryption add overhead.

Practical note: Server-side encryption on a Kinesis stream adds measurable latency to GetRecords calls. If you enable encryption and see GetRecords.Latency increase, this is expected behavior. Monitor it to establish a new baseline post-encryption.

PutRecords.Success and PutRecord.Success: Producer-side success rates. A drop indicates that producers are failing to write records – not just being throttled but encountering errors. Monitor both to differentiate throttling (which shows in WriteProvisionedThroughputExceeded) from errors (which show as failed success rates).

Shard-Level Metrics for Hot Shard Diagnosis

Stream-level metrics aggregate across all shards. When WriteProvisionedThroughputExceeded or GetRecords.IteratorAgeMilliseconds is elevated at the stream level, but the overall IncomingBytes looks normal. The cause is often a hot shard – one shard receiving far more traffic than others due to a poor partition key distribution.

Shard-level metrics break down by both StreamName and ShardId, letting you see which specific shard is the bottleneck.

Enable enhanced monitoring (shard-level metrics):

aws kinesis enable-enhanced-monitoring \

  --stream-name your-stream-name \

  --shard-level-metrics IncomingBytes IncomingRecords \

    WriteProvisionedThroughputExceeded \

    ReadProvisionedThroughputExceeded \

    IteratorAgeMilliseconds

Cost note: Each shard-level metric creates approximately 43,200 CloudWatch PutMetricData calls per month per shard. For a stream with 10 shards and 5 shard-level metrics enabled, that is 2,160,000 PutMetricData calls per month, billed at CloudWatch custom metric rates. Enable shard-level metrics for troubleshooting specific issues, then disable them when the issue is resolved to avoid ongoing cost.

On-Demand vs Provisioned Mode: Monitoring Implications

Kinesis Data Streams supports two capacity modes with different monitoring implications:

Provisioned mode: You specify the number of shards. Write and read throttling are your responsibility to monitor and act on. WriteProvisionedThroughputExceeded and ReadProvisionedThroughputExceeded are critical metrics. You scale by resharding (UpdateShardCount).

On-demand mode: Kinesis automatically scales capacity based on traffic. Write throttling is less of a concern because AWS scales shards automatically in response to traffic peaks. However, scaling is not instantaneous – brief throttling can still occur during sudden traffic spikes. Monitor WriteProvisionedThroughputExceeded even in on-demand mode.

A new on-demand data stream starts with a quota of 4 MB/second and 4,000 records/second for writes, automatically scaling up to 200 MB/second and 200,000 records/second. The stream can scale up to double its previous 30-day peak write throughput.

Increase Retention Period as a Stopgap

If GetRecords.IteratorAgeMilliseconds is climbing, and you cannot scale consumers fast enough; increasing the stream retention period buys time. The default is 24 hours; extended retention to 7 days is available at additional cost, and long-term retention up to 365 days is also available.

This is a temporary measure. The root fix is always adding consumers, fixing consumer processing logic, or resharding to increase read parallelism.

# Extend retention to 7 days as a stopgap

aws kinesis increase-stream-retention-period \

  --stream-name your-stream-name \

  --retention-period-hours 168

How Do I Find Which Service Is Causing Kinesis Consumer Lag?

GetRecords.IteratorAgeMilliseconds tells you that consumers are falling behind. It does not tell you which consumer application is the bottleneck, whether the lag is caused by one slow processing batch or a widespread failure, or what effect it is having on the services downstream that depend on records being processed on time.

When the iterator age climbs, the debugging path in CloudWatch alone looks like this: check IncomingBytes to rule out a producer surge, check GetRecords.Success to rule out consumer read failures, enable shard-level metrics to look for hot shards. Each step is a separate query, a separate graph, and a manual correlation of timestamps. You still do not know which application is slow or why.

CubeAPM instruments the producer and consumer services around your Kinesis stream via OpenTelemetry, capturing each PutRecords call and each batch of records processed as spans in a distributed trace. When iterator age spikes in CloudWatch, the trace in CubeAPM shows you which producer flooded the stream, which consumer application is processing too slowly, how long each batch of records is taking to process, and what the downstream impact looks like across the services that depend on timely delivery. The alarm identifies the symptom. The trace identifies the cause. Both run self-hosted inside your own AWS account.

Summary

MetricStatisticAlert thresholdPriority
GetRecords.IteratorAgeMillisecondsMaximumWarning: 5 min (300,000 ms). Critical: 50% of retention periodCritical
WriteProvisionedThroughputExceededSum> 0 over 5 minutesCritical
ReadProvisionedThroughputExceededSum> 0 over 5 minutesHigh
IncomingBytesSumDrops to 0 during expected ingestionHigh
GetRecords.SuccessAverageSustained drop from baselineHigh
GetRecords.LatencyAverageEstablish baseline, alert on sustained increaseMedium
PutRecords.SuccessAverageSustained drop from baselineMedium

Start with GetRecords.IteratorAgeMilliseconds (Maximum) and WriteProvisionedThroughputExceeded (Sum) – these two cover the most critical failure modes: consumer lag and producer throttling. Add the read throttle alarm if you have multiple consumer applications. Enable shard-level metrics only when diagnosing specific hot shard or uneven load distribution issues.

Disclaimer: Configurations, thresholds, and CLI commands are for guidance only – verify against current Amazon Kinesis Data Streams CloudWatch metrics documentation before applying to production. Kinesis quotas, pricing, and enhanced fan-out limits change over time. CubeAPM references reflect genuine use cases; evaluate all tools against your own requirements.

Also read:

What Are the Key AWS SQS Metrics to Monitor?

How to Monitor EKS Pods and Nodes with Grafana

How to Monitor GKE Clusters with Prometheus and Grafana

×
×