CubeAPM
CubeAPM CubeAPM

How to Monitor AWS Lambda Function Performance and Latency

How to Monitor AWS Lambda Function Performance and Latency

Table of Contents

You monitor Lambda function performance and latency primarily through the Duration metric in CloudWatch, using percentile statistics – p95 and p99 – rather than average. Average duration is almost always misleading: a function that completes in 50ms 99% of the time but takes 4 seconds on cold starts will show a deceptively healthy average. Percentiles expose what your slowest users are actually experiencing.

Latency in Lambda comes from two distinct sources – your code’s execution time and the cold start initialization that precedes it on the first invocation in a new execution environment – and they need to be tracked separately.

Key Takeaways

  • Monitor Duration with p99, not average – average hides the tail latency that your slowest users feel
  • Cold start Init Duration is not part of the Duration metric – it is reported separately in the REPORT log line and requires Lambda Insights or a log metric filter to track as a proper metric
  • The right alert threshold for latency is 80% of your configured timeout – this catches degradation before invocations start failing
  • For API Gateway-fronted Lambda functions, also monitor API Gateway’s Latency and IntegrationLatency metrics – they capture end-to-end latency, including API Gateway’s own overhead, which Lambda Duration alone does not
  • Memory and latency are directly linked in Lambda – CPU is allocated proportionally to memory, so an underpowered function is often a slow function

The Two Types of Lambda Latency

Before setting up monitoring, it helps to separate the two latency buckets Lambda gives you.

Execution latency is the time from when your handler starts to when it returns. This is what the Duration metric captures. It includes your code logic, SDK calls, and any downstream calls to DynamoDB, RDS, HTTP APIs, or other services.

Initialization latency (cold start) is the time Lambda spends setting up the execution environment before your handler runs – downloading your code, starting the runtime, and running your module-level initialization code. This is the Init Duration value in the REPORT log line. It is billed separately from Duration (since August 2025) and is not included in the Duration metric.

If you only watch Duration, you are only watching half the latency picture.

What to Monitor and Where to Find It

Duration (Execution Latency)

The Duration metric lives in the AWS/Lambda namespace and is emitted automatically for every invocation. The three statistics that matter:

StatisticWhat it tells youWhen to use it
AverageMean execution timeUseful as a rough trend indicator only
p99Maximum duration for 99% of invocationsPrimary latency SLO signal
p95Maximum duration for 95% of invocationsGood for day-to-day alerting
MaxSingle slowest invocation in the periodUseful for finding outliers, not for alarms

Practical note: Use p95 for routine alerting and p99 for SLO tracking. Max is often driven by a single anomalous invocation – cold start, transient downstream spike – and firing alarms on Max leads to noise. Average masks tail latency problems entirely.

Init Duration (Cold Start Latency)

Init Duration is written to CloudWatch Logs in the REPORT line of every cold start invocation:

REPORT RequestId: abc-123  Duration: 240.55 ms  Billed Duration: 241 ms

Memory Size: 512 MB  Max Memory Used: 198 MB  Init Duration: 412.33 ms

It does not appear in the AWS/Lambda metrics namespace by default. To monitor it as a proper metric, you have two options:

  1. Enable Lambda Insights – surfaces init_duration as an alarmable metric in the LambdaInsights namespace
  2. Create a CloudWatch Metric Filter on the log group parsing Init Duration from REPORT lines

Without one of these in place, cold start latency is invisible to your alarms.

API Gateway Latency (End-to-End)

If your Lambda function sits behind API Gateway, the Latency and IntegrationLatency metrics in the AWS/ApiGateway namespace give you a fuller picture:

  • IntegrationLatency – time API Gateway spent waiting for Lambda to respond (includes Lambda execution + cold start if applicable)
  • Latency – total time from request received to response sent, including API Gateway’s own processing overhead

For user-facing APIs, alarms on API Gateway p95/p99 Latency are often more meaningful than Lambda Duration alone – they reflect what your users experience, not just what your function took.

Setting Up Latency Alerts

Alert 1: p99 Duration (Proactive Latency Warning)

aws cloudwatch put-metric-alarm \

  --alarm-name "LambdaLatency-p99-YourFunction" \

  --metric-name Duration \

  --namespace AWS/Lambda \

  --extended-statistic p99 \

  --period 300 \

  --evaluation-periods 3 \

  --threshold 4000 \

  --comparison-operator GreaterThanOrEqualToThreshold \

  --alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic \

  --dimensions Name=FunctionName,Value=your-function-name

Threshold to set: 80% of your configured timeout. If your timeout is 30 seconds, set the threshold at 24,000ms. For a 5-second timeout, alarm at 4,000ms.

Alert 2: Init Duration (Cold Start Latency)

If you have Lambda Insights enabled:

aws cloudwatch put-metric-alarm \

  --alarm-name "LambdaColdStart-YourFunction" \

  --metric-name init_duration \

  --namespace LambdaInsights \

  --statistic Average \

  --period 300 \

  --evaluation-periods 2 \

  --threshold 1000 \

  --comparison-operator GreaterThanOrEqualToThreshold \

  --alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic \

  --dimensions Name=function_name,Value=your-function-name

Threshold to set: 1,000ms average init duration is a practical trigger for investigation on most runtimes. For Java functions, adjust to 2,000ms given typical JVM startup times.

Alert 3: API Gateway Latency (End-to-End, if applicable)

aws cloudwatch put-metric-alarm \

  --alarm-name "APIGatewayLatency-p95-YourAPI" \

  --metric-name Latency \

  --namespace AWS/ApiGateway \

  --extended-statistic p95 \

  --period 300 \

  --evaluation-periods 2 \

  --threshold 3000 \

  --comparison-operator GreaterThanOrEqualToThreshold \

  --alarm-actions arn:aws:sns:us-east-1:123456789:your-alert-topic \

  --dimensions Name=ApiName,Value=your-api-name

The Memory-Latency Relationship

Lambda allocates CPU proportionally to memory. A function configured at 256MB receives half the CPU of the same function at 512MB. This means that for compute-bound workloads, increasing memory allocation directly reduces execution time – often to a degree that more than offsets the higher per-ms cost.

This is not intuitive, and it is one of the most commonly overlooked levers for latency improvement.

AWS Lambda Power Tuning (an open-source Step Functions tool maintained at github.com/alexcasalboni/aws-lambda-power-tuning) runs your function at multiple memory configurations and returns the optimal cost-performance balance. It is worth running on any function where latency matters.

Key memory thresholds to know:

  • At 1,792MB, Lambda allocates the equivalent of one full vCPU. Going above this only helps multi-threaded functions.
  • For single-threaded functions, 1,792MB is the ceiling of meaningful CPU gain.
  • Going below the memory your function actually needs risks OOM kills and increases duration simultaneously.

Common Latency Root Causes

When a latency alarm fires, Duration alone will not tell you why. The most frequent causes:

  • Slow downstream dependency: A Lambda function waiting on a sluggish DynamoDB query, RDS connection, or external HTTP call will show high duration with nothing obviously wrong in your function code. The only way to see this is with distributed tracing – a trace that shows the full span breakdown inside the invocation.
  • Cold start on user-facing path: If p99 Duration is high but average is normal, cold starts are likely the cause. Check Init Duration alongside Duration. If init is 800ms and your timeout is 3 seconds, a cold-start invocation has very little headroom left for actual execution.
  • Undersized memory: If Lambda Insights shows memory utilization consistently above 70-80% and duration is high, the function may be CPU-constrained. Increase memory and retest.
  • VPC cold start overhead: Lambda functions attached to a VPC have historically incurred longer cold starts due to elastic network interface (ENI) setup. If your function is VPC-attached and cold starts are disproportionately slow, this is a likely factor. Use VPC endpoints for AWS service calls rather than routing through the internet from inside the VPC.
  • Connection setup on every invocation: SDK clients and database connections initialized inside the handler – rather than at module level outside the handler – are recreated on every invocation. Move them outside the handler so they are reused across warm invocations.

Logs Insights Queries for Latency Investigation

When an alarm fires, these queries help you drill into what is slow:

Find the slowest invocations in the last hour:

filter @type = "REPORT"

| fields @requestId, @duration, @initDuration, @memorySize, @maxMemoryUsed

| sort @duration desc

| limit 25

Compare cold start vs warm p99 duration:

filter @type = "REPORT"

| stats

    percentile(@duration, 99) as p99_all,

    percentile(@duration, 99) as p99_warm by ispresent(@initDuration) as is_cold_start

| display is_cold_start, p99_all

Find invocations approaching timeout:

filter @type = "REPORT"

| filter @duration > 20000

| fields @requestId, @duration, @initDuration

| sort @duration desc

What Duration Monitoring Misses

CloudWatch Duration tells you how long your function ran. It does not tell you what it was doing during that time.

A 3-second invocation could mean your function code took 3 seconds, or it could mean your function spent 2.8 seconds waiting on a DynamoDB query that returned 10,000 items it did not need. Duration looks identical in both cases.

CubeAPM instruments Lambda via the OpenTelemetry layer and breaks duration into its component spans: how long each downstream call took, which service responded slowly, and how this invocation fits into the broader request chain that triggered it. When a latency alarm fires and CloudWatch shows you a high p99 but no obvious error, the span breakdown in CubeAPM is where the actual diagnosis happens – without switching tools or writing custom Logs Insights queries. Self-hosted in your own AWS account.

Summary

What to monitorMetricThreshold
Execution latencyAWS/Lambda Duration p9980% of configured timeout
Cold start latencyLambdaInsights init_duration1,000ms average (adjust per runtime)
End-to-end API latencyAWS/ApiGateway Latency p95Per your SLO
Memory pressure affecting latencyLambdaInsights memory_utilizationAlert above 85%

Use p95 and p99 for Duration – not average, not max. Track Init Duration as a separate metric – it is not included in Duration. And when you need to understand why Duration is high, not just that it is high, you need distributed traces alongside your metrics.

Disclaimer: Configurations, thresholds, and code examples are for guidance only. Verify against the current AWS and OpenTelemetry documentation before applying to production. AWS service details change frequently. CubeAPM references reflect genuine use cases; evaluate all tools against your own requirements. 

×
×