CubeAPM
CubeAPM CubeAPM

How to Monitor AWS API Gateway Latency, Errors, and Throttling

How to Monitor AWS API Gateway Latency, Errors, and Throttling

Table of Contents

Amazon API Gateway sits at the front door of your serverless and microservices architecture. Every request your users make passes through it before reaching Lambda, EC2, or any other backend service. That makes it a critical chokepoint, and also the most important place to watch when things go wrong.

Without proper AWS API gateway monitoring, a sudden surge of 5XX errors or a 300ms latency spike can go undetected for minutes or even hours, by which time your users have already noticed. This guide shows you exactly how to monitor the three things that matter most: latency, errors, and throttling. You will learn which CloudWatch metrics to track, how to set up access logs and alarms, and how to troubleshoot the most common issues in production.

Key Takeaways
  • API Gateway sends eight core metrics to CloudWatch every minute, including Latency, IntegrationLatency, 4XXError, and 5XXError.
  • Use IntegrationLatency vs. Latency to determine whether slowness is in your backend code or inside API Gateway itself.
  • A 4XXError spike at code 429 means throttling; at 401 or 403 it means authentication or authorization issues. Enable access logs to distinguish between them.
  • API Gateway enforces a default account-level limit of 10,000 requests per second and a 5,000 concurrent-request burst in most regions.
  • CloudWatch Alarms on 5XXError, Latency p99, and Count anomalies form the foundation of a reliable monitoring strategy.
  • Detailed CloudWatch metrics (per-resource granularity) must be explicitly enabled in the API stage settings and carry additional cost.
  • AWS X-Ray tracing lets you follow a request end-to-end from API Gateway through Lambda, making latency root cause analysis much faster.

Understanding the Key AWS API Gateway Metrics in CloudWatch

API Gateway publishes metrics to Amazon CloudWatch under the 

AWS/ApiGateway namespace. By default, metrics arrive every minute. The table below covers every metric you need to know.

MetricWhat It MeasuresWhy It Matters
LatencyTotal time from request received to response returned (ms)Directly impacts user experience; high values signal backend or gateway overhead issues
IntegrationLatencyTime from API Gateway forwarding request to receiving backend response (ms)Isolates whether slowness is in your Lambda/backend code vs. API Gateway overhead
4XXErrorCount of client-side errors (400, 401, 403, 404, 429) in a periodReveals auth failures, missing params, and throttling (429) patterns
5XXErrorCount of server-side errors (502, 503, 504) in a periodIndicates Lambda crashes, integration misconfigs, or backend timeouts
CountTotal number of API requests in a periodBaseline for calculating error rates; detects traffic anomalies
CacheHitCountRequests served from API cacheMeasures cache effectiveness; high hits reduce backend load
CacheMissCountRequests sent to backend when caching is enabledHigh misses indicate cache config needs tuning
DataProcessedBytes of data processed (HTTP APIs)Useful for cost monitoring and payload optimization

A few important points about how these metrics work:

  • Latency vs. IntegrationLatency. Latency is the complete round-trip time. IntegrationLatency covers only your backend. If Latency is 800ms but IntegrationLatency is only 50ms, the overhead is inside API Gateway itself, possibly from caching, authorizer invocations, or mapping templates.
  • Sum vs. Average for errors. The Sum statistic on 4XXError and 5XXError gives the raw count. The Average gives you the error rate (errors divided by total requests). Both matter. A high count with a low rate means traffic is high but healthy; a low count with a high rate means something is broken on a low-traffic endpoint.
  • Detailed metrics. By default, CloudWatch only provides aggregated metrics at the stage level. To see per-method and per-resource breakdowns, you must enable Detailed CloudWatch Metrics in the Stage settings. Note that enabling these metrics carries additional CloudWatch charges.

To enable detailed metrics for a REST API stage: go to API Gateway Console > Stages > [Your Stage] > Logs and Tracing > Edit, then toggle Detailed metrics on. For the CLI, run:

aws apigateway update-stage --rest-api-id <api-id> --stage-name <stage> --patch-operations op=replace,path=/metricsEnabled,value=true

Monitoring API Gateway Latency

Latency is the metric users feel most directly. Amazon API Gateway has a hard maximum timeout of 29 seconds for REST APIs and HTTP APIs. Any response that takes longer returns a 504 Gateway Timeout error automatically.

What Causes High Latency?

  • Lambda cold starts: When a Lambda function has been idle, the next invocation triggers a cold start that can add hundreds of milliseconds. Keep functions warm using scheduled pings or enable Provisioned Concurrency for critical endpoints.
  • Cross-region calls: If your API Gateway stage is in us-east-1 but your Lambda runs in eu-west-1, every request incurs transatlantic round-trip overhead. Always co-locate your API Gateway and backend resources in the same region.
  • Cache encryption overhead: If you have API Gateway response caching enabled with encryption turned on, each cache read and write incurs encryption and decryption overhead. Disable cache encryption if your security posture allows it, or monitor the CacheHitCount to measure whether caching is actually helping.
  • IAM role-based authorization: Role verification for every request can add latency. If this is an issue, consider resource-based policies or Lambda authorizer caching, which lets API Gateway reuse authorization decisions for a configurable TTL.
  • Complex mapping templates: Heavy Velocity Template Language (VTL) transformations on the request or response increase processing time inside API Gateway. Simplify or remove mapping templates where possible.
  • Backend bottlenecks: DynamoDB throttling, RDS connection pooling, or slow downstream HTTP services all appear as high IntegrationLatency. Always check IntegrationLatency first before assuming API Gateway is the culprit.

How to Diagnose Latency Issues Step by Step

  1. Check IntegrationLatency first. If IntegrationLatency is close to Latency, the problem is in your backend. If IntegrationLatency is very low but Latency is high, the problem is in API Gateway processing (authorizers, mapping templates, caching).
  2. Enable AWS X-Ray tracing. Go to API Gateway Console > Stages > Logs and Tracing > Enable X-Ray Tracing. X-Ray generates a service map showing exactly how long each hop takes. This is the fastest way to isolate whether the bottleneck is in Lambda initialization, DynamoDB queries, or external HTTP calls.
  3. Review CloudWatch Logs for $context.responseLatency. In your access log format, include the variable $context.responseLatency. Sort log lines by this value to find the endpoints taking the longest to respond.
  4. Check Lambda Duration metrics. In the AWS/Lambda namespace, monitor the Duration metric for your integrated functions. A growing p99 Duration directly causes a growing Latency in API Gateway.

Recommended Latency Alarm

Create a CloudWatch alarm on the p99 Latency metric:

Namespace: AWS/ApiGatewayMetric: Latency (p99 extended statistic)Dimension: ApiName + StageThreshold: > 5000 ms for 3 consecutive 1-minute periodsAction: Notify SNS topic

Monitoring API Gateway Errors (4XX and 5XX)

The 4XXError and 5XXError metrics in CloudWatch are your primary signals for broken API behavior. However, the aggregated counts hide important distinctions that matter in production.

4XX Errors: Client-Side Problems

A 4XX response means the client sent an invalid or unauthorized request. Common causes include:

  • 400 Bad Request: Missing or malformed request parameters or body.
  • 401 Unauthorized: Missing, expired, or invalid token. Common when OAuth scopes change or Cognito client configurations are modified.
  • 403 Forbidden: The caller is authenticated but lacks permission. Check IAM policies, resource policies, and API key configuration.
  • 404 Not Found: The requested path does not exist in the API definition.
  • 429 Too Many Requests: The client exceeded the rate limit. This appears as a 4XX error but is a throttling event, not a client bug.

The problem with aggregated 4XX metrics: A flat 4XX count masks what is actually wrong. As demonstrated in a detailed case study by Javier Mendoza on Medium, a team disabled OAuth scopes on a Cognito app client and created a flood of 401 errors that were completely invisible in the aggregated 4XX dashboard. No alarm fired because the total count stayed within its threshold.

Getting Granular 4XX Visibility with Access Logs and Metric Filters

The solution is to enable Access Logs and create CloudWatch Metric Filters for specific status codes.

Enable Access Logging: In API Gateway console, go to Stages > [Stage] > Logs and Tracing > Edit. Set an access log destination (a CloudWatch Log Group) and include the following in your log format:

{ "requestId":"$context.requestId", "method":"$context.httpMethod", "path":"$context.resourcePath", "status":"$context.status", "latency":"$context.responseLatency", "errorMessage":"$context.errorMessage" }

Create a Metric Filter for 401 errors: In CloudWatch Logs, go to your log group, select Metric Filters > Create filter. Use the filter pattern:

{ $.status = 401 }
  • Create a CloudWatch Alarm on this custom metric and set the threshold to alert you when even a single 401 occurs. Repeat for 403 and 429.

5XX Errors: Server-Side and Integration Problems

5XX errors indicate problems in the API Gateway integration layer or your backend:

  • 502 Bad Gateway: API Gateway received an invalid response from the backend. Common causes include Lambda returning malformed JSON, wrong content type headers, or unhandled exceptions that produce unexpected payloads.
  • 503 Service Unavailable: The integration endpoint is unreachable or rate-limited at the AWS service level.
  • 504 Gateway Timeout: The backend did not respond within the 29-second API Gateway timeout. Usually caused by Lambda timeouts, slow DynamoDB queries, or external HTTP calls without proper timeouts.

To troubleshoot 5XX spikes:

  • Check IntegrationLatency for values approaching 29,000ms, which signals a timeout scenario.
  • Look at Lambda Errors and Throttles metrics in the AWS/Lambda namespace.
  • Enable X-Ray tracing to see which segment of the request chain is failing.
  • Check CloudWatch execution logs for integration error messages. These logs include the actual error response from the backend.

Monitoring API Gateway Throttling

Throttling is a feature, not a bug, but it looks exactly like an outage to your users if you have not set up proper monitoring and retry logic. API Gateway enforces throttling at multiple levels:

Throttle TypeDefault LimitNotes
Account-level steady-state10,000 requests/second (rps) per regionApplies across all APIs in the account
Account-level burst5,000 concurrent requestsToken bucket; burst above steady-state briefly
Stage-level throttlingConfigurable per stageOverride the default to protect specific APIs
Usage plan throttlingConfigurable per API keyPer-consumer rate and quota limits
Lambda concurrency1,000 by default per regionLambda throttles translate to 502 or 429 from API Gateway

How Throttling Appears in Metrics

When API Gateway throttles a request, it returns a 429 Too Many Requests response. This appears in the 4XXError metric. Because 429 and other 4XX codes are lumped together in the default metric, it is easy to miss a throttling event unless you:

  • Enable access logs and create a Metric Filter for status code 429.
  • Monitor the Count metric for sudden drops, which can indicate that requests are being rejected before they are even processed.
  • Use CloudWatch Usage metrics in the AWS/Usage namespace to see API call rates relative to your account-level Service Quotas.

Implementing Retry Logic to Handle Throttling Gracefully

Your clients should implement exponential backoff with jitter when they receive a 429 response. The AWS SDKs implement this automatically for AWS service calls, but for your own API clients you need to add it explicitly:

  • Retry: On receiving a 429 or a 5XX, wait and retry.
  • Exponential backoff: Double the wait time with each successive retry. Set a maximum wait interval (for example, 30 seconds) to avoid very long delays.
  • Jitter: Add a small random amount to the backoff time so that multiple clients do not all retry at the same moment and cause another burst.

Requesting a Throttle Limit Increase

If your application consistently needs more than 10,000 rps in a region, you can request a quota increase via the Service Quotas console. Go to AWS Console > Service Quotas > Amazon API Gateway > Routes and aliases per function, locate the throttle-related quotas, and submit a limit increase request. AWS typically processes increases within one to two business days.

Setting Up CloudWatch Alarms for API Gateway

Dashboards show you what happened. Alarms tell you when to act. Here are the minimum alarms every production API Gateway should have:

Essential Alarms

  • High 5XX error rate: Trigger when the Average of 5XXError over 5 minutes exceeds 1% (0.01). This catches backend failures before they escalate.
  • p99 Latency threshold: Trigger when the p99 (extended statistic) of Latency exceeds your SLA threshold, for example 5,000ms for three consecutive minutes.
  • Throttling events (429): Create a Metric Filter on access logs for $.status = 429 and alarm immediately on any occurrence. This fires before users start complaining about rate limit errors.
  • Request count anomaly: Use CloudWatch Anomaly Detection on the Count metric. This automatically models your normal traffic pattern and alerts you when traffic drops or spikes beyond expected bounds, catching both traffic surges and outages where requests stop arriving entirely.
  • Integration timeout proximity: Alarm when IntegrationLatency exceeds 25,000ms. This gives you a 4-second warning window before the 29-second timeout kicks in.

Creating an Alarm: Step-by-Step

  • Open the CloudWatch console and choose All alarms > Create alarm.
  • Choose Select metric, then navigate to AWS/ApiGateway > By Api Name > Select Latency.
  • Set the statistic to p99 and the period to 1 minute.
  • In Conditions, set the threshold to Greater than 5000 (milliseconds).
  • Set Datapoints to alarm to 3 out of 3 to avoid false positives from transient spikes.
  • In Notification, select your SNS topic (or create one pointing to your PagerDuty or Slack integration).
  • Give the alarm a name like apigw-prod-latency-p99-high and choose Create alarm.

Enabling and Using AWS X-Ray for End-to-End Tracing

CloudWatch metrics tell you that something went wrong. X-Ray shows you exactly where in the request chain it happened. When X-Ray tracing is enabled on an API Gateway stage, it generates a trace for every sampled request, including time spent in:

  • API Gateway authorization and routing
  • Lambda initialization (cold start indicator)
  • Lambda execution
  • Downstream calls from Lambda to DynamoDB, S3, RDS, or external HTTP services

Enabling X-Ray on an API Gateway Stage

  • Go to API Gateway Console > Stages > [Your Stage] > Logs and Tracing.
  • Toggle Enable X-Ray Tracing to on.
  • In your Lambda function, add the AWS X-Ray SDK and instrument your code to create subsegments for each downstream call.
  • In the X-Ray console, open Service Map to see the topology of your API Gateway plus connected services, with latency heatmaps on each connection.

X-Ray also integrates with Dynatrace and third-party APM tools for deeper analysis. Dynatrace, for example, can correlate API Gateway access logs with X-Ray traces to build a unified view of API health.

Common API Gateway Monitoring Mistakes to Avoid

You will miss specific error patterns like auth failures (401) and throttling (429) that are buried inside the overall error count. Always add granular access log metric filters.

If you only watch Latency, you cannot tell whether the problem is in your backend or in API Gateway itself.

Without adequate retention, diagnosing intermittent issues that happened two weeks ago becomes impossible. Set CloudWatch Log Group retention to at least 30 days for production APIs.

Traffic volumes change. A static threshold set during low-traffic periods generates constant false positives during high-traffic periods. Use CloudWatch Anomaly Detection instead.

The default stage-level metrics may look fine even when a single endpoint is broken. Enabling detailed metrics gives you per-resource and per-method visibility, which is worth the additional CloudWatch cost for production APIs.

When Lambda runs out of concurrent execution capacity, API Gateway receives an error and returns 502. Monitor the Throttles metric in the AWS/Lambda namespace alongside the API Gateway 5XXError metric.

Quick Reference: Monitoring Checklist

  • CloudWatch metrics enabled: Confirm the AWS/ApiGateway namespace is receiving Latency, 4XXError, 5XXError, Count, IntegrationLatency.
  • Detailed metrics enabled: Turn on per-resource and per-method metrics for production stages.
  • Access logs configured: Log group set, JSON format including status, latency, errorMessage, and requestId.
  • Metric Filters created: Separate filters for HTTP status 401, 403, 429, 502, 504.
  • CloudWatch Alarms active: 5XX error rate, p99 Latency, IntegrationLatency proximity alarm, Count anomaly.
  • X-Ray tracing enabled: On for production stages; integrated with Lambda instrumentation.
  • SNS notifications routed: Alarms connected to an SNS topic that posts to PagerDuty, Slack, or your on-call system.
  • Dashboard created: CloudWatch dashboard with all key metrics on a single screen.
  • Retry logic in clients: Exponential backoff with jitter on 429 and 5XX responses.
  • Quota increase requested: If steady-state traffic approaches 10,000 rps in any region, submit a Service Quotas increase request proactively.
AWS API Gateway Monitoring
Stop Flying Blind on Your AWS API Gateway
CubeAPM gives you instant visibility into latency, error rates, and throttling across all your API Gateway stages. No complex setup, no missed alerts.
Correlate API Gateway metrics with backend Lambda performance in one unified dashboard.
Start Free Trial with CubeAPM →

Conclusion

Monitoring AWS API Gateway effectively comes down to four fundamentals: knowing which CloudWatch metrics matter (Latency, IntegrationLatency, 4XXError, 5XXError, and Count), enabling access logs to get granular visibility below the aggregated metrics, setting up proactive alarms before issues escalate, and using X-Ray to trace requests end-to-end when you need to find the root cause.

The difference between an hour-long outage and a five-minute incident is almost always whether the right alarm fired at the right time. Start with the five essential alarms described in Section 5, enable access logs with granular metric filters for 401, 403, and 429, and add X-Ray tracing to your production stages. These three steps alone will dramatically improve your ability to detect and resolve API Gateway issues before your users notice.

Disclaimer

This article is intended for informational purposes only. AWS services, pricing, quota limits, and console interfaces may change. Always refer to the official Amazon API Gateway documentation for the most current and accurate information. Third-party tools mentioned are for illustrative purposes and do not constitute an endorsement.

FAQs

1. What is the default throttle limit for AWS API Gateway?

The default account-level throttle limit is 10,000 requests per second (rps) with a burst capacity of 5,000 concurrent requests per region. You can request increases via the Service Quotas console. Individual stages and usage plans can have lower limits configured to protect specific APIs.

2. How do I tell whether high latency is caused by Lambda or by API Gateway itself?

Compare the IntegrationLatency metric with the Latency metric in CloudWatch. IntegrationLatency measures only the time from when API Gateway forwards the request to your backend to when it receives the response. If IntegrationLatency is high, the problem is in your backend (Lambda cold start, slow DynamoDB query, etc.). If IntegrationLatency is low but Latency is high, the overhead is inside API Gateway (authorizer, mapping template, caching).

3. Why are my 4XX errors not triggering an alarm even though authentication is failing?

The default 4XXError CloudWatch metric aggregates all 4XX status codes (400, 401, 403, 404, 429) into a single count. A specific surge in 401 (Unauthorized) errors can be completely hidden if overall 4XX traffic is steady. The fix is to enable access logs with JSON format and create a CloudWatch Metric Filter specifically for $.status = 401, then create a separate alarm on that custom metric.

4. What does a 502 Bad Gateway error from API Gateway mean?

A 502 error means API Gateway received an invalid or unexpected response from the backend integration. The most common causes are: Lambda returning malformed JSON, Lambda returning a response with an incorrect content-type header, an unhandled exception in Lambda that produces a non-standard error format, or Lambda being throttled and returning an error to API Gateway. Check CloudWatch execution logs and X-Ray traces to identify which cause applies to your situation.

5. Does enabling detailed CloudWatch metrics for API Gateway cost extra?

Yes. Detailed metrics add per-method and per-resource granularity by sending additional metric data points to CloudWatch. Each unique metric combination (ApiName + Method + Resource + Stage) is billed as a separate CloudWatch custom metric. For a large API with many routes, this can add up. For most production APIs the additional cost is small compared to the diagnostic value, but for cost-sensitive environments you can selectively enable detailed metrics only on high-traffic or high-criticality routes.

×
×