CubeAPM
CubeAPM CubeAPM

How to Monitor AWS DynamoDB Read/Write Capacity and Throttles

How to Monitor AWS DynamoDB Read/Write Capacity and Throttles

Table of Contents

DynamoDB publishes metrics to CloudWatch automatically every minute at no extra charge. The challenge is knowing which metrics matter for capacity monitoring and throttle diagnosis, and understanding that provisioned tables and on-demand tables use different metrics for different failure modes.

Throttling in DynamoDB can occur at two levels: the table or GSI level (when total throughput exceeds provisioned or on-demand capacity) and the partition level (when a single partition receives a disproportionate share of requests, causing hot partition throttling regardless of total capacity). These require different metrics and different fixes.

Key Takeaways

  • All DynamoDB metrics are free and in the AWS/DynamoDB CloudWatch namespace. No Enhanced Monitoring or additional configuration needed.
  • Provisioned tables use ProvisionedThroughputExceededException. On-demand tables use ThrottlingException. These are different error types and map to different CloudWatch metrics.
  • AWS launched enhanced throttle metrics in September 2025 that break down throttle events by type: ReadProvisionedThroughputThrottleEvents, WriteProvisionedThroughputThrottleEvents, ReadMaxOnDemandThroughputThrottleEvents, WriteMaxOnDemandThroughputThrottleEvents, ReadKeyRangeThroughputThrottleEvents, WriteKeyRangeThroughputThrottleEvents, ReadAccountLimitThrottleEvents, and WriteAccountLimitThrottleEvents. Use these to pinpoint the root cause, not just that throttling occurred.
  • Each DynamoDB partition supports a maximum of 3,000 read capacity units and 1,000 write capacity units per second independently. You can hit partition-level throttling even when your total table capacity looks fine.
  • Always use the Sum statistic for capacity and throttle metrics, not Average – a single throttle event is significant and averaging masks it.
  • treat-missing-data notBreaching is required on throttle alarms – missing data means zero throttles, which is healthy.

Capacity Metrics: Provisioned Tables

ConsumedReadCapacityUnits and ConsumedWriteCapacityUnits

These are the primary throughput metrics. They measure how many read and write capacity units your table actually consumed in a period.

What good looks like: Consumption consistently below your provisioned capacity, with headroom for spikes. AWS recommends auto scaling targets of 70% utilization by default, meaning consumption stays below 70% of provisioned capacity in steady state.

What bad looks like: Consumption consistently at or near provisioned capacity. Once requests exceed provisioned capacity, DynamoDB throws ProvisionedThroughputExceededException and clients must retry.

Important: DynamoDB reports consumed capacity in one-minute aggregates. Auto scaling triggers when consumed capacity breaches the target utilization for two consecutive minutes. During the scale-up period (which takes several minutes for UpdateTable to complete), requests exceeding the old provisioned capacity continue to be throttled.

Alert on 80% of provisioned capacity (Sum over 5 minutes):

# For a table provisioned at 1,000 RCU:

# Max per 5-min period = 1,000 * 300 = 300,000 units

# 80% threshold = 300,000 * 0.8 = 240,000

aws cloudwatch put-metric-alarm \

  --alarm-name "DynamoDB-ReadCapacity-High" \

  --metric-name ConsumedReadCapacityUnits \

  --namespace AWS/DynamoDB \

  --statistic Sum \

  --period 300 \

  --evaluation-periods 2 \

  --threshold 240000 \

  --comparison-operator GreaterThanThreshold \

  --dimensions Name=TableName,Value=your-table-name \

  --alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic \

  --treat-missing-data notBreaching

Repeat for ConsumedWriteCapacityUnits with your provisioned write capacity.

ProvisionedReadCapacityUnits and ProvisionedWriteCapacityUnits

These report the currently configured provisioned capacity for your table. Use them alongside consumed capacity to calculate utilization percentage in dashboards.

# Utilization as a percentage in a CloudWatch metric math alarm:

# ConsumedReadCapacityUnits / (ProvisionedReadCapacityUnits * 60) * 100

# The *60 converts per-second provisioned capacity to per-minute consumed units

Capacity Metrics: On-Demand Tables

On-demand tables do not have a fixed provisioned capacity to compare against. Instead, monitor against account-level and table-level quotas.

OnDemandMaxReadRequestUnits and OnDemandMaxWriteRequestUnits

These metrics show the maximum throughput configured for your on-demand table or GSI. By default, new on-demand tables support up to 12,000 read request units and 4,000 write request units per second. If your previous peak was higher, the table retains that peak capacity.

Monitor ConsumedReadCapacityUnits and ConsumedWriteCapacityUnits against these limits. If consumption approaches the on-demand maximum, throttling will occur.

Default on-demand account-level quotas: 40,000 WCU and 40,000 RCU per table. These can be raised via a service quota increase request.

Throttle Metrics: The Core Signals

ReadThrottleEvents and WriteThrottleEvents

These count the number of requests throttled at the table or GSI level. Any non-zero value means requests were rejected and clients had to retry (or fail, if retries were exhausted).

Alert threshold: Any throttle event – Sum >= 1 over a 5-minute window. Throttles should not be a normal occurrence in a well-configured table.

aws cloudwatch put-metric-alarm \

  --alarm-name "DynamoDB-WriteThrottles" \

  --metric-name WriteThrottleEvents \

  --namespace AWS/DynamoDB \

  --statistic Sum \

  --period 300 \

  --evaluation-periods 1 \

  --threshold 1 \

  --comparison-operator GreaterThanOrEqualToThreshold \

  --dimensions Name=TableName,Value=your-table-name \

  --alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic \

  --treat-missing-data notBreaching

Critical: treat-missing-data notBreaching is required. When a table has no activity, no metric data is published. Without this setting, CloudWatch treats missing data as a breach and fires a false alarm.

ThrottledRequests

This aggregates all throttle events across the table and all its indexes. It is useful as a single combined alarm but less useful for diagnosis – use the granular metrics below to identify where throttling is occurring.

The Enhanced Throttle Metrics (September 2025)

AWS announced enhanced throttling observability for DynamoDB in September 2025. These new metrics break down throttle events by the specific reason they occurred, making root cause diagnosis significantly faster than with ReadThrottleEvents and WriteThrottleEvents alone.

MetricWhat it indicates
ReadProvisionedThroughputThrottleEventsReads throttled because provisioned read capacity was exceeded at the table level
WriteProvisionedThroughputThrottleEventsWrites throttled because provisioned write capacity was exceeded at the table level
ReadMaxOnDemandThroughputThrottleEventsReads throttled because on-demand maximum throughput was exceeded
WriteMaxOnDemandThroughputThrottleEventsWrites throttled because on-demand maximum throughput was exceeded
ReadKeyRangeThroughputThrottleEventsReads throttled due to a hot partition (partition-level key range limit exceeded)
WriteKeyRangeThroughputThrottleEventsWrites throttled due to a hot partition (partition-level key range limit exceeded)
ReadAccountLimitThrottleEventsReads throttled because the account-level read capacity limit was exceeded
WriteAccountLimitThrottleEventsWrites throttled because the account-level write capacity limit was exceeded

Why this matters: The fix for provisioned throughput throttling is increasing capacity or enabling auto scaling. The fix for hot partition throttling (KeyRange metrics) is redesigning the partition key or adding write sharding – more capacity does not help. The fix for GSI back-pressure is increasing capacity on the GSI itself, not the base table. These are completely different remediation paths. Without the granular metrics, you cannot tell which problem you have.

Alert on hot partition throttling specifically:

aws cloudwatch put-metric-alarm \

  --alarm-name "DynamoDB-HotPartition-WriteThrottles" \

  --metric-name WriteKeyRangeThroughputThrottleEvents \

  --namespace AWS/DynamoDB \

  --statistic Sum \

  --period 300 \

  --evaluation-periods 1 \

  --threshold 1 \

  --comparison-operator GreaterThanOrEqualToThreshold \

  --dimensions Name=TableName,Value=your-table-name \

  --alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic \

  --treat-missing-data notBreaching

Partition-Level Throttling: The Hidden Problem

DynamoDB distributes data across partitions automatically. Each partition supports a maximum of 3,000 read capacity units and 1,000 write capacity units per second, independently of each other. This limit applies regardless of your total table capacity or capacity mode.

This means a table provisioned at 10,000 WCU can still experience throttling if 20% of all writes go to a single partition key value – that one partition would receive 2,000 WCU, exceeding its 1,000 WCU limit, while the other partitions sit idle.

Common causes of hot partitions:

  • Low-cardinality partition keys (for example, using a status field with only a few values)
  • Time-based partition keys where all recent writes go to the current time bucket
  • Monotonically increasing keys such as auto-increment IDs

Diagnosing hot partitions with CloudWatch Contributor Insights:

# Enable Contributor Insights on a table

aws dynamodb update-contributor-insights \

  --table-name your-table-name \

  --contributor-insights-action ENABLE

Once enabled, Contributor Insights shows the most frequently accessed and most frequently throttled partition keys in your table. AWS launched a new “Throttled keys” mode in September 2025 that only processes events when throttling actually occurs – significantly more cost-effective than the previous full access mode for tables with moderate traffic.

Use the Throttled keys mode for continuous monitoring. Use the “Accessed and throttled keys” mode when actively investigating access pattern issues.

GSI Back-Pressure Throttling

When a Global Secondary Index (GSI) is throttled, the throttling can create back-pressure that throttles writes on the base table too – even if the base table itself has sufficient capacity.

This happens because DynamoDB must write to both the base table and all relevant GSIs synchronously. If the GSI cannot accept the write (because its write capacity is exhausted), the base table write is also rejected.

Signs of GSI back-pressure:

  • WriteThrottleEvents on the base table is elevated but WriteProvisionedThroughputThrottleEvents is low or zero
  • WriteThrottleEvents on the specific GSI (using the GlobalSecondaryIndexName dimension) shows the throttling source
  • Adding capacity to the base table does not reduce throttling

Fix: Increase write capacity on the throttled GSI, not the base table. Use the dimension GlobalSecondaryIndexName in CloudWatch to monitor GSI-specific metrics:

aws cloudwatch put-metric-alarm \

  --alarm-name "DynamoDB-GSI-WriteThrottles" \

  --metric-name WriteThrottleEvents \

  --namespace AWS/DynamoDB \

  --statistic Sum \

  --period 300 \

  --evaluation-periods 1 \

  --threshold 1 \

  --comparison-operator GreaterThanOrEqualToThreshold \

  --dimensions Name=TableName,Value=your-table-name Name=GlobalSecondaryIndexName,Value=your-gsi-name \

  --alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic \

  --treat-missing-data notBreaching

SuccessfulRequestLatency

Throttle metrics tell you requests were rejected. Latency tells you requests are slow even when not throttled – which can indicate hot partitions that have not yet crossed the throttle threshold, or large item sizes adding I/O overhead.

Useful statistics:

  • Average: baseline health
  • p99: what the slowest 1% of requests experience
  • p99.9: for latency-sensitive workloads, enter manually in the CloudWatch console

Alert threshold: DynamoDB targets single-digit millisecond latency for standard operations. For GetItem and PutItem on items under 4 KB, p99 above 20ms sustained indicates a problem worth investigating.

Auto Scaling: Monitoring the Scaling Loop

For provisioned tables with auto scaling enabled, the scaling behavior itself is worth monitoring:

  • Auto scaling triggers when consumed capacity breaches the target utilization for two consecutive minutes
  • UpdateTable is then invoked – capacity increase takes several minutes to complete
  • During this window, requests exceeding the old provisioned capacity are throttled even though scaling is underway
  • Scale-down requires 15 consecutive data points below the target utilization before triggering

If you see brief throttle spikes followed by recovery, this is often the auto scaling gap – traffic spiked, stayed elevated for two consecutive minutes, scaling triggered, and throttles occurred during the provisioning delay. The fix is either pre-warming capacity before expected traffic spikes or accepting brief throttles as a cost of the auto scaling model.

How Do I Find Which Query Is Causing Throttling?

CloudWatch metrics tell you that throttling is occurring, at what rate, and because of which capacity limit. They do not tell you which API call, which application service, or which user action generated the request that was throttled.

When WriteKeyRangeThroughputThrottleEvents spikes on your table and Contributor Insights shows the hot partition key, CloudWatch stops there. You know which key is hot, but not which application endpoint is generating writes to that key, how frequently per request it runs, or whether it is a single service or multiple services contributing to the load.

How Do I Trace a DynamoDB Throttle Back to the Application Request?

CubeAPM instruments your application via the OpenTelemetry standard and captures every DynamoDB call as a span inside the full request trace. When a throttle alarm fires, the trace in CubeAPM shows which API endpoint triggered the DynamoDB write, which partition key was used, how many DynamoDB calls were made per request, and whether the same key is being written by multiple services simultaneously. CloudWatch identifies the symptom at the table level. The trace identifies the application code responsible. Self-hosted inside your own AWS account, no data leaves your environment.

Summary

MetricCapacity modeAlert thresholdWhat it tells you
ConsumedReadCapacityUnitsProvisioned> 80% of provisioned per periodApproaching read capacity ceiling
ConsumedWriteCapacityUnitsProvisioned> 80% of provisioned per periodApproaching write capacity ceiling
ReadThrottleEventsBothSum >= 1 over 5 minRead requests being rejected
WriteThrottleEventsBothSum >= 1 over 5 minWrite requests being rejected
ReadProvisionedThroughputThrottleEventsProvisionedSum >= 1Table-level read capacity exceeded
WriteProvisionedThroughputThrottleEventsProvisionedSum >= 1Table-level write capacity exceeded
ReadMaxOnDemandThroughputThrottleEventsOn-demandSum >= 1On-demand read maximum exceeded
WriteMaxOnDemandThroughputThrottleEventsOn-demandSum >= 1On-demand write maximum exceeded
ReadKeyRangeThroughputThrottleEventsBothSum >= 1Hot partition – fix the key design
WriteKeyRangeThroughputThrottleEventsBothSum >= 1Hot partition – fix the key design
ReadThrottleEvents (GSI dimension)BothSum >= 1GSI read throttling (use GlobalSecondaryIndexName dimension)
WriteThrottleEvents (GSI dimension)BothSum >= 1GSI write throttling (use GlobalSecondaryIndexName dimension)
SuccessfulRequestLatencyBothp99 > 20ms sustainedSlow operations – check hot partitions

Start with ReadThrottleEvents and WriteThrottleEvents alarms for every table in production. Add the eight enhanced metrics to understand why throttling is happening when those alarms fire. Enable Contributor Insights in Throttled keys mode for any table where throttling is a recurring issue.

Disclaimer: Configurations and thresholds are for guidance only – verify against current AWS DynamoDB CloudWatch metrics documentation before applying to production. DynamoDB metrics, quota values, and on-demand capacity defaults change over time. CubeAPM references reflect genuine use cases; evaluate all tools against your own requirements.

Also read:

SolarWinds Observability vs Orion: What Is the Difference?

What Are the Key AWS SQS Metrics to Monitor?

How to Monitor AWS ElastiCache for Redis Performance

×
×