Kinesis Data Streams publishes two levels of metrics to CloudWatch: stream-level metrics (always available, free, published every minute) and shard-level metrics (opt-in, billed per metric per shard, useful for hot shard diagnosis). Monitoring stream lag and throughput requires tracking both what goes into the stream and whether your consumers are keeping up with it.
The single most important metric for consumer lag is GetRecords.IteratorAgeMilliseconds. The single most important metric for producer throughput is WriteProvisionedThroughputExceeded. Everything else provides supporting context. Let’s understand how to monitor AWS Kinesis Stream lags and throughput.
Key Takeaways
- Always use the Maximum statistic for GetRecords.IteratorAgeMilliseconds – using Average can hide a single lagging shard that puts your stream at risk of data loss
- If GetRecords.IteratorAgeMilliseconds reaches 100% of your stream’s retention period, records begin expiring before they are consumed – data is permanently lost
- AWS recommends alerting when the iterator age exceeds 50% of your stream retention period
- WriteProvisionedThroughputExceeded should be zero in normal operation – any non-zero value means records are being throttled and your producers must handle retries
- Shard-level metrics are not free – each shard-level metric creates approximately 43,200 CloudWatch PutMetricData calls per month per shard. Enable them selectively for troubleshooting, not universally
- For standard GetRecords consumers, each shard is limited to 5 read transactions per second and 2 MB/second of read throughput, shared across all consumers on that shard. Use enhanced fan-out (SubscribeToShard) when you need more than one consuming application
Stream Lag: GetRecords.IteratorAgeMilliseconds
What it is: The age in milliseconds of the last record returned by all GetRecords calls on the stream. It measures the gap between the timestamp of the most recently read record and the current time. A value of zero means consumers are fully caught up. A growing value means consumers are falling behind.
Why it is the primary lag metric: Iterator age directly answers the question: how old is the newest record my consumer has seen? If this value is growing, your consumers cannot keep pace with your producers.
What good looks like: Near zero in steady state. Some lag is expected during traffic spikes, as long as it recovers.
What bad looks like: A value that keeps climbing without recovering. At 50% of your retention period, start scaling consumers immediately. At 100%, records are expiring before they are consumed – data loss is occurring.
Retention period reference:
| Retention setting | 50% alert threshold | 100% data loss threshold |
| 24 hours (default) | 12 hours (43,200,000 ms) | 24 hours (86,400,000 ms) |
| 7 days | 3.5 days (302,400,000 ms) | 7 days |
| 365 days | 182.5 days | 365 days |
Alert threshold to set: AWS recommends alerting when the iterator age exceeds 50% of your retention period. For a stream with default 24-hour retention, that is 43,200,000 milliseconds.
# Warning: iterator age exceeds 5 minutes (consumer falling behind)
aws cloudwatch put-metric-alarm \
--alarm-name "kinesis-iterator-age-warning" \
--metric-name GetRecords.IteratorAgeMilliseconds \
--namespace AWS/Kinesis \
--dimensions Name=StreamName,Value=your-stream-name \
--statistic Maximum \
--period 60 \
--evaluation-periods 3 \
--threshold 300000 \
--comparison-operator GreaterThanThreshold \
--alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic
# Critical: iterator age exceeds 1 hour (approaching retention risk on default 24h retention)
aws cloudwatch put-metric-alarm \
--alarm-name "kinesis-iterator-age-critical" \
--metric-name GetRecords.IteratorAgeMilliseconds \
--namespace AWS/Kinesis \
--dimensions Name=StreamName,Value=your-stream-name \
--statistic Maximum \
--period 60 \
--evaluation-periods 2 \
--threshold 3600000 \
--comparison-operator GreaterThanThreshold \
--alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topicAlways use Maximum, not Average. The Maximum statistic catches the single worst-lagging consumer or shard. The Average can look healthy even when one shard is hours behind. AWS re:Post explicitly recommends Maximum for this metric.
When the iterator age increases – common causes:
- Consumer processing logic is too slow for the incoming record rate
- Insufficient GetRecords call frequency (consumer not polling fast enough)
- Hot shards: partition key is not distributing records evenly, so some shards receive far more records than others
- Lambda consumer timeout or memory issues blocking processing
- Connection timeout issues causing intermittent GetRecords failures
Write Throughput: IncomingBytes and IncomingRecords
What they are: IncomingBytes and IncomingRecords measure the total data volume and record count being written to the stream per period. These are your producer throughput metrics.
What good looks like: Consistent with your expected ingestion pattern. Use Sum over 1-minute periods for accurate throughput measurement.
What bad looks like: A sudden drop to zero or near zero, which means producers have stopped or failed. A spike far above baseline may indicate a retry storm from upstream failures.
Shard capacity reference:
- Each shard supports 1 MB/second of write throughput
- Each shard supports 1,000 records/second of write throughput
- A stream with 4 shards supports 4 MB/second and 4,000 records/second
To calculate current utilization per shard, divide IncomingBytes Sum (per minute) by 60 seconds and by your shard count.
Alert threshold to set:
- Alert when IncomingBytes drops to 0 during expected ingestion windows – producers have stopped
- Enable shard-level IncomingBytes to detect hot shards when aggregate throughput looks healthy but latency is high
Write Throttling: WriteProvisionedThroughputExceeded
What it is: The number of records rejected because the write throughput limit for a shard was exceeded. Any non-zero value means your producers are being throttled.
What good looks like: Zero. Always.
What bad looks like: Any sustained non-zero value. When a PutRecord or PutRecords call is throttled, the producer must retry. If your producers do not implement exponential backoff with jitter on throttle responses, you will see retry storms that compound the throttling.
Alert threshold to set:
- Alert: WriteProvisionedThroughputExceeded Sum > 0 over a 5-minute period
aws cloudwatch put-metric-alarm \
--alarm-name "kinesis-write-throttle" \
--metric-name WriteProvisionedThroughputExceeded \
--namespace AWS/Kinesis \
--dimensions Name=StreamName,Value=your-stream-name \
--statistic Sum \
--period 300 \
--evaluation-periods 1 \
--threshold 0 \
--comparison-operator GreaterThanThreshold \
--alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topicWhat to do when it fires:
- Check if a single shard is receiving disproportionate write load (hot shard – requires shard-level metrics)
- Review the partition key distribution in your producers – sequential or low-cardinality keys cause hot shards
- Add shards to increase write capacity (UpdateShardCount)
- Switch from PutRecord to PutRecords for better batching efficiency
Read Throttling: ReadProvisionedThroughputExceeded
What it is: The number of GetRecords calls throttled because read throughput limits were exceeded. Each shard supports 5 GetRecords transactions per second and 2 MB/second of read throughput, shared across all standard consumers.
What good looks like: Zero. Any non-zero value means consumers are competing for the same shard’s read bandwidth.
What bad looks like: Sustained read throttling, particularly when you have multiple consumer applications reading the same stream with GetRecords.
Alert threshold to set:
- Alert: ReadProvisionedThroughputExceeded Sum > 0 over a 5-minute period
The enhanced fan-out solution: If you have more than one consuming application, standard GetRecords forces them to share the 2 MB/second/shard read throughput. Enhanced fan-out (via SubscribeToShard) gives each registered consumer its own dedicated 2 MB/second/shard pipe. Data is pushed to consumers via HTTP/2 within approximately 70 milliseconds of being written – approximately 65% faster delivery than polling with GetRecords.
Enhanced fan-out consumer limits:
- On-demand Standard and Provisioned: up to 20 registered consumers per stream
- On-demand Advantage (as of November 2025): up to 50 registered consumers per stream
Enhanced fan-out is billed separately at $0.015 per consumer-shard-hour plus $0.013 per GB retrieved (us-east-1 pricing – verify current pricing).
GetRecords Success Rate and Latency
GetRecords.Success: The count of successful GetRecords calls. A drop in success rate, combined with rising iterator age, confirms consumer read failures. Use Average statistic.
GetRecords.Latency: The time in milliseconds taken per GetRecords operation. This can increase when there are more records to fetch per call (larger batches), or when network issues or server-side encryption add overhead.
Practical note: Server-side encryption on a Kinesis stream adds measurable latency to GetRecords calls. If you enable encryption and see GetRecords.Latency increase, this is expected behavior. Monitor it to establish a new baseline post-encryption.
PutRecords.Success and PutRecord.Success: Producer-side success rates. A drop indicates that producers are failing to write records – not just being throttled but encountering errors. Monitor both to differentiate throttling (which shows in WriteProvisionedThroughputExceeded) from errors (which show as failed success rates).
Shard-Level Metrics for Hot Shard Diagnosis
Stream-level metrics aggregate across all shards. When WriteProvisionedThroughputExceeded or GetRecords.IteratorAgeMilliseconds is elevated at the stream level, but the overall IncomingBytes looks normal. The cause is often a hot shard – one shard receiving far more traffic than others due to a poor partition key distribution.
Shard-level metrics break down by both StreamName and ShardId, letting you see which specific shard is the bottleneck.
Enable enhanced monitoring (shard-level metrics):
aws kinesis enable-enhanced-monitoring \
--stream-name your-stream-name \
--shard-level-metrics IncomingBytes IncomingRecords \
WriteProvisionedThroughputExceeded \
ReadProvisionedThroughputExceeded \
IteratorAgeMillisecondsCost note: Each shard-level metric creates approximately 43,200 CloudWatch PutMetricData calls per month per shard. For a stream with 10 shards and 5 shard-level metrics enabled, that is 2,160,000 PutMetricData calls per month, billed at CloudWatch custom metric rates. Enable shard-level metrics for troubleshooting specific issues, then disable them when the issue is resolved to avoid ongoing cost.
On-Demand vs Provisioned Mode: Monitoring Implications
Kinesis Data Streams supports two capacity modes with different monitoring implications:
Provisioned mode: You specify the number of shards. Write and read throttling are your responsibility to monitor and act on. WriteProvisionedThroughputExceeded and ReadProvisionedThroughputExceeded are critical metrics. You scale by resharding (UpdateShardCount).
On-demand mode: Kinesis automatically scales capacity based on traffic. Write throttling is less of a concern because AWS scales shards automatically in response to traffic peaks. However, scaling is not instantaneous – brief throttling can still occur during sudden traffic spikes. Monitor WriteProvisionedThroughputExceeded even in on-demand mode.
A new on-demand data stream starts with a quota of 4 MB/second and 4,000 records/second for writes, automatically scaling up to 200 MB/second and 200,000 records/second. The stream can scale up to double its previous 30-day peak write throughput.
Increase Retention Period as a Stopgap
If GetRecords.IteratorAgeMilliseconds is climbing, and you cannot scale consumers fast enough; increasing the stream retention period buys time. The default is 24 hours; extended retention to 7 days is available at additional cost, and long-term retention up to 365 days is also available.
This is a temporary measure. The root fix is always adding consumers, fixing consumer processing logic, or resharding to increase read parallelism.
# Extend retention to 7 days as a stopgap
aws kinesis increase-stream-retention-period \
--stream-name your-stream-name \
--retention-period-hours 168How Do I Find Which Service Is Causing Kinesis Consumer Lag?
GetRecords.IteratorAgeMilliseconds tells you that consumers are falling behind. It does not tell you which consumer application is the bottleneck, whether the lag is caused by one slow processing batch or a widespread failure, or what effect it is having on the services downstream that depend on records being processed on time.
When the iterator age climbs, the debugging path in CloudWatch alone looks like this: check IncomingBytes to rule out a producer surge, check GetRecords.Success to rule out consumer read failures, enable shard-level metrics to look for hot shards. Each step is a separate query, a separate graph, and a manual correlation of timestamps. You still do not know which application is slow or why.
CubeAPM instruments the producer and consumer services around your Kinesis stream via OpenTelemetry, capturing each PutRecords call and each batch of records processed as spans in a distributed trace. When iterator age spikes in CloudWatch, the trace in CubeAPM shows you which producer flooded the stream, which consumer application is processing too slowly, how long each batch of records is taking to process, and what the downstream impact looks like across the services that depend on timely delivery. The alarm identifies the symptom. The trace identifies the cause. Both run self-hosted inside your own AWS account.
Summary
| Metric | Statistic | Alert threshold | Priority |
| GetRecords.IteratorAgeMilliseconds | Maximum | Warning: 5 min (300,000 ms). Critical: 50% of retention period | Critical |
| WriteProvisionedThroughputExceeded | Sum | > 0 over 5 minutes | Critical |
| ReadProvisionedThroughputExceeded | Sum | > 0 over 5 minutes | High |
| IncomingBytes | Sum | Drops to 0 during expected ingestion | High |
| GetRecords.Success | Average | Sustained drop from baseline | High |
| GetRecords.Latency | Average | Establish baseline, alert on sustained increase | Medium |
| PutRecords.Success | Average | Sustained drop from baseline | Medium |
Start with GetRecords.IteratorAgeMilliseconds (Maximum) and WriteProvisionedThroughputExceeded (Sum) – these two cover the most critical failure modes: consumer lag and producer throttling. Add the read throttle alarm if you have multiple consumer applications. Enable shard-level metrics only when diagnosing specific hot shard or uneven load distribution issues.
Disclaimer: Configurations, thresholds, and CLI commands are for guidance only – verify against current Amazon Kinesis Data Streams CloudWatch metrics documentation before applying to production. Kinesis quotas, pricing, and enhanced fan-out limits change over time. CubeAPM references reflect genuine use cases; evaluate all tools against your own requirements.
Also read:
What Are the Key AWS SQS Metrics to Monitor?
How to Monitor EKS Pods and Nodes with Grafana
How to Monitor GKE Clusters with Prometheus and Grafana





