You monitor Lambda function performance and latency primarily through the Duration metric in CloudWatch, using percentile statistics – p95 and p99 – rather than average. Average duration is almost always misleading: a function that completes in 50ms 99% of the time but takes 4 seconds on cold starts will show a deceptively healthy average. Percentiles expose what your slowest users are actually experiencing.
Latency in Lambda comes from two distinct sources – your code’s execution time and the cold start initialization that precedes it on the first invocation in a new execution environment – and they need to be tracked separately.
Key Takeaways
- Monitor Duration with p99, not average – average hides the tail latency that your slowest users feel
- Cold start Init Duration is not part of the Duration metric – it is reported separately in the REPORT log line and requires Lambda Insights or a log metric filter to track as a proper metric
- The right alert threshold for latency is 80% of your configured timeout – this catches degradation before invocations start failing
- For API Gateway-fronted Lambda functions, also monitor API Gateway’s Latency and IntegrationLatency metrics – they capture end-to-end latency, including API Gateway’s own overhead, which Lambda Duration alone does not
- Memory and latency are directly linked in Lambda – CPU is allocated proportionally to memory, so an underpowered function is often a slow function
The Two Types of Lambda Latency
Before setting up monitoring, it helps to separate the two latency buckets Lambda gives you.
Execution latency is the time from when your handler starts to when it returns. This is what the Duration metric captures. It includes your code logic, SDK calls, and any downstream calls to DynamoDB, RDS, HTTP APIs, or other services.
Initialization latency (cold start) is the time Lambda spends setting up the execution environment before your handler runs – downloading your code, starting the runtime, and running your module-level initialization code. This is the Init Duration value in the REPORT log line. It is billed separately from Duration (since August 2025) and is not included in the Duration metric.
If you only watch Duration, you are only watching half the latency picture.
What to Monitor and Where to Find It
Duration (Execution Latency)
The Duration metric lives in the AWS/Lambda namespace and is emitted automatically for every invocation. The three statistics that matter:
| Statistic | What it tells you | When to use it |
| Average | Mean execution time | Useful as a rough trend indicator only |
| p99 | Maximum duration for 99% of invocations | Primary latency SLO signal |
| p95 | Maximum duration for 95% of invocations | Good for day-to-day alerting |
| Max | Single slowest invocation in the period | Useful for finding outliers, not for alarms |
Practical note: Use p95 for routine alerting and p99 for SLO tracking. Max is often driven by a single anomalous invocation – cold start, transient downstream spike – and firing alarms on Max leads to noise. Average masks tail latency problems entirely.
Init Duration (Cold Start Latency)
Init Duration is written to CloudWatch Logs in the REPORT line of every cold start invocation:
REPORT RequestId: abc-123 Duration: 240.55 ms Billed Duration: 241 ms
Memory Size: 512 MB Max Memory Used: 198 MB Init Duration: 412.33 ms
It does not appear in the AWS/Lambda metrics namespace by default. To monitor it as a proper metric, you have two options:
- Enable Lambda Insights – surfaces init_duration as an alarmable metric in the LambdaInsights namespace
- Create a CloudWatch Metric Filter on the log group parsing Init Duration from REPORT lines
Without one of these in place, cold start latency is invisible to your alarms.
API Gateway Latency (End-to-End)
If your Lambda function sits behind API Gateway, the Latency and IntegrationLatency metrics in the AWS/ApiGateway namespace give you a fuller picture:
- IntegrationLatency – time API Gateway spent waiting for Lambda to respond (includes Lambda execution + cold start if applicable)
- Latency – total time from request received to response sent, including API Gateway’s own processing overhead
For user-facing APIs, alarms on API Gateway p95/p99 Latency are often more meaningful than Lambda Duration alone – they reflect what your users experience, not just what your function took.
Setting Up Latency Alerts
Alert 1: p99 Duration (Proactive Latency Warning)
aws cloudwatch put-metric-alarm \
--alarm-name "LambdaLatency-p99-YourFunction" \
--metric-name Duration \
--namespace AWS/Lambda \
--extended-statistic p99 \
--period 300 \
--evaluation-periods 3 \
--threshold 4000 \
--comparison-operator GreaterThanOrEqualToThreshold \
--alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic \
--dimensions Name=FunctionName,Value=your-function-nameThreshold to set: 80% of your configured timeout. If your timeout is 30 seconds, set the threshold at 24,000ms. For a 5-second timeout, alarm at 4,000ms.
Alert 2: Init Duration (Cold Start Latency)
If you have Lambda Insights enabled:
aws cloudwatch put-metric-alarm \
--alarm-name "LambdaColdStart-YourFunction" \
--metric-name init_duration \
--namespace LambdaInsights \
--statistic Average \
--period 300 \
--evaluation-periods 2 \
--threshold 1000 \
--comparison-operator GreaterThanOrEqualToThreshold \
--alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic \
--dimensions Name=function_name,Value=your-function-nameThreshold to set: 1,000ms average init duration is a practical trigger for investigation on most runtimes. For Java functions, adjust to 2,000ms given typical JVM startup times.
Alert 3: API Gateway Latency (End-to-End, if applicable)
aws cloudwatch put-metric-alarm \
--alarm-name "APIGatewayLatency-p95-YourAPI" \
--metric-name Latency \
--namespace AWS/ApiGateway \
--extended-statistic p95 \
--period 300 \
--evaluation-periods 2 \
--threshold 3000 \
--comparison-operator GreaterThanOrEqualToThreshold \
--alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic \
--dimensions Name=ApiName,Value=your-api-nameThe Memory-Latency Relationship
Lambda allocates CPU proportionally to memory. A function configured at 256MB receives half the CPU of the same function at 512MB. This means that for compute-bound workloads, increasing memory allocation directly reduces execution time – often to a degree that more than offsets the higher per-ms cost.
This is not intuitive, and it is one of the most commonly overlooked levers for latency improvement.
AWS Lambda Power Tuning (an open-source Step Functions tool maintained at github.com/alexcasalboni/aws-lambda-power-tuning) runs your function at multiple memory configurations and returns the optimal cost-performance balance. It is worth running on any function where latency matters.
Key memory thresholds to know:
- At 1,792MB, Lambda allocates the equivalent of one full vCPU. Going above this only helps multi-threaded functions.
- For single-threaded functions, 1,792MB is the ceiling of meaningful CPU gain.
- Going below the memory your function actually needs risks OOM kills and increases duration simultaneously.
Common Latency Root Causes
When a latency alarm fires, Duration alone will not tell you why. The most frequent causes:
- Slow downstream dependency: A Lambda function waiting on a sluggish DynamoDB query, RDS connection, or external HTTP call will show high duration with nothing obviously wrong in your function code. The only way to see this is with distributed tracing – a trace that shows the full span breakdown inside the invocation.
- Cold start on user-facing path: If p99 Duration is high but average is normal, cold starts are likely the cause. Check Init Duration alongside Duration. If init is 800ms and your timeout is 3 seconds, a cold-start invocation has very little headroom left for actual execution.
- Undersized memory: If Lambda Insights shows memory utilization consistently above 70-80% and duration is high, the function may be CPU-constrained. Increase memory and retest.
- VPC cold start overhead: Lambda functions attached to a VPC have historically incurred longer cold starts due to elastic network interface (ENI) setup. If your function is VPC-attached and cold starts are disproportionately slow, this is a likely factor. Use VPC endpoints for AWS service calls rather than routing through the internet from inside the VPC.
- Connection setup on every invocation: SDK clients and database connections initialized inside the handler – rather than at module level outside the handler – are recreated on every invocation. Move them outside the handler so they are reused across warm invocations.
Logs Insights Queries for Latency Investigation
When an alarm fires, these queries help you drill into what is slow:
Find the slowest invocations in the last hour:
filter @type = "REPORT"
| fields @requestId, @duration, @initDuration, @memorySize, @maxMemoryUsed
| sort @duration desc
| limit 25
Compare cold start vs warm p99 duration:
filter @type = "REPORT"
| stats
percentile(@duration, 99) as p99_all,
percentile(@duration, 99) as p99_warm by ispresent(@initDuration) as is_cold_start
| display is_cold_start, p99_all
Find invocations approaching timeout:
filter @type = "REPORT"
| filter @duration > 20000
| fields @requestId, @duration, @initDuration
| sort @duration descWhat Duration Monitoring Misses
CloudWatch Duration tells you how long your function ran. It does not tell you what it was doing during that time.
A 3-second invocation could mean your function code took 3 seconds, or it could mean your function spent 2.8 seconds waiting on a DynamoDB query that returned 10,000 items it did not need. Duration looks identical in both cases.
CubeAPM instruments Lambda via the OpenTelemetry layer and breaks duration into its component spans: how long each downstream call took, which service responded slowly, and how this invocation fits into the broader request chain that triggered it. When a latency alarm fires and CloudWatch shows you a high p99 but no obvious error, the span breakdown in CubeAPM is where the actual diagnosis happens – without switching tools or writing custom Logs Insights queries. Self-hosted in your own AWS account.
Summary
| What to monitor | Metric | Threshold |
| Execution latency | AWS/Lambda Duration p99 | 80% of configured timeout |
| Cold start latency | LambdaInsights init_duration | 1,000ms average (adjust per runtime) |
| End-to-end API latency | AWS/ApiGateway Latency p95 | Per your SLO |
| Memory pressure affecting latency | LambdaInsights memory_utilization | Alert above 85% |
Use p95 and p99 for Duration – not average, not max. Track Init Duration as a separate metric – it is not included in Duration. And when you need to understand why Duration is high, not just that it is high, you need distributed traces alongside your metrics.
Disclaimer: Configurations, thresholds, and code examples are for guidance only. Verify against the current AWS and OpenTelemetry documentation before applying to production. AWS service details change frequently. CubeAPM references reflect genuine use cases; evaluate all tools against your own requirements.





