You trace AWS Lambda cold starts with OpenTelemetry by setting the faas.coldstart boolean attribute on the root invocation span. When it’s true, the invocation triggered a cold start. When it’s false, the execution environment is already warm. Every OTel-compatible backend can then filter, compare, and alert on cold start latency separately from warm invocations, something CloudWatch alone cannot do.
Key Takeaways
- faas.coldstart is the OTel FaaS semantic convention attribute that identifies cold start invocations. Set it on the root span, not a child span
- You have two setup paths: the AWS-managed ADOT Lambda layer (no code changes) or manual OTel instrumentation (more control)
- Never use BatchSpanProcessor in Lambda: spans will be silently lost when the execution environment freezes after your handler returns
- Cold starts typically affect under 1% of Lambda invocations, but since AWS began billing the INIT phase separately in August 2025, cold start duration is now a direct cost item, not just a latency concern
- OTel auto-instrumentation can itself add 200-800ms to cold start time; keep instrumentation scope tight
Option 1: The ADOT Lambda Layer (Fastest Path)
The AWS Distro for OpenTelemetry (ADOT) Lambda layer is an AWS-managed layer that bundles the OTel SDK and a Collector extension. It sets faas.coldstart automatically — no code changes required.
What to do:
Step 1. Add the ADOT layer ARN for your runtime and region. Find the current ARNs at aws-otel.github.io.
Step 2. Set the exec wrapper environment variable:
AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-instrument
Step 3. Point it at your OTLP endpoint:
OTEL_EXPORTER_OTLP_ENDPOINT=https://your-apm-or-collector:4318
OTEL_SERVICE_NAME=my-lambda-function
What you get automatically: AWS SDK calls (DynamoDB, S3, SQS, SNS), outbound HTTP requests, root invocation spans with faas.coldstart set correctly, and W3C TraceContext propagation for distributed tracing.
The tradeoff: The layer itself adds 50–150ms to cold start time. For most functions, that’s acceptable. For extremely latency-sensitive functions, manual instrumentation lets you control exactly what gets loaded during init.
Option 2: Manual Instrumentation
Manual instrumentation gives you the most flexibility, particularly the ability to measure exactly how long your own initialization code took, broken down by phase if you want it.
The pattern: Record a module-level timestamp and a boolean at file load time (which runs during cold start). On the first handler invocation, compute the elapsed time and set faas.coldstart = true. On every subsequent invocation in the same container, faas.coldstart = false.
Python:
# handler.py
import os, time
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import SimpleSpanProcessor # not Batch — see below
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.trace import SpanKind
# Module-level: runs once per execution environment (i.e., on every cold start)
_INIT_TIME = time.time()
_IS_COLD_START = True
provider = TracerProvider(resource=Resource.create({
"service.name": os.environ["AWS_LAMBDA_FUNCTION_NAME"],
"faas.name": os.environ["AWS_LAMBDA_FUNCTION_NAME"],
"faas.version": os.environ["AWS_LAMBDA_FUNCTION_VERSION"],
"cloud.provider": "aws",
"cloud.platform": "aws_lambda",
"cloud.region": os.environ["AWS_REGION"],
}))
provider.add_span_processor(
SimpleSpanProcessor(OTLPSpanExporter(
endpoint=os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] + "/v1/traces"
))
)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)
def handler(event, context):
global _IS_COLD_START
with tracer.start_as_current_span("lambda.invoke", kind=SpanKind.SERVER) as span:
span.set_attributes({
"faas.coldstart": _IS_COLD_START,
"faas.invocation_id": context.aws_request_id,
"faas.trigger": "http", # adjust per trigger: http | pubsub | datasource | timer
})
if _IS_COLD_START:
init_ms = int((time.time() - _INIT_TIME) * 1000)
span.set_attribute("faas.init_duration_ms", init_ms)
span.add_event("cold_start", {"init_duration_ms": init_ms})
_IS_COLD_START = False
# your handler logic here
Node.js:
// handler.js
const { trace, SpanKind } = require('@opentelemetry/api');
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { Resource } = require('@opentelemetry/resources');
// Module-level: runs once on cold start
const initTime = Date.now();
let isColdStart = true;
const provider = new NodeTracerProvider({
resource: new Resource({
'service.name': process.env.AWS_LAMBDA_FUNCTION_NAME,
'faas.name': process.env.AWS_LAMBDA_FUNCTION_NAME,
'faas.version': process.env.AWS_LAMBDA_FUNCTION_VERSION,
'cloud.provider': 'aws',
'cloud.platform': 'aws_lambda',
'cloud.region': process.env.AWS_REGION,
}),
});
provider.addSpanProcessor(
new SimpleSpanProcessor(
new OTLPTraceExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/traces' })
)
);
provider.register();
const tracer = trace.getTracer('lambda-tracer');
exports.handler = async (event, context) => {
return tracer.startActiveSpan('lambda.invoke', { kind: SpanKind.SERVER }, async (span) => {
span.setAttribute('faas.coldstart', isColdStart);
span.setAttribute('faas.invocation_id', context.awsRequestId);
if (isColdStart) {
const initMs = Date.now() - initTime;
span.setAttribute('faas.init_duration_ms', initMs);
span.addEvent('cold_start', { init_duration_ms: initMs });
isColdStart = false;
}
// your handler logic here
span.end();
});
};Practical note: faas.init_duration_ms here measures the time between module load and first handler invocation – that is, your code’s initialization time. It does not include container provisioning, which happens before your code runs and is invisible to instrumentation.
The BatchSpanProcessor Mistake
This is the most common OTel + Lambda error, and it causes silent data loss with no error message.
BatchSpanProcessor queues spans in memory and flushes them on a background timer. In a long-running service, this is the right choice. In Lambda, when your handler returns, the execution environment freezes immediately. Background threads pause. The flush timer never fires. Any spans still in the queue are lost.
Always use SimpleSpanProcessor in Lambda:
# Correct
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
provider.add_span_processor(SimpleSpanProcessor(exporter))
# Wrong: spans will be silently dropped on freeze
from opentelemetry.sdk.trace.export import BatchSpanProcessor
provider.add_span_processor(BatchSpanProcessor(exporter))
The exception: If you’re using the ADOT layer’s Collector extension, the extension runs in the Lambda extension lifecycle and receives a shutdown signal before the environment freezes, so it can flush correctly. But any instrumentation you add yourself should still use SimpleSpanProcessor.
Querying Cold Start Traces in Your APM
Once faas.coldstart is flowing through your traces, you can filter by it in any OTel-compatible backend:
| Backend | Query |
| Jaeger | Tag filter: faas.coldstart = true |
| Grafana Tempo / TraceQL | { span.faas.coldstart = true } |
| CubeAPM | Span attribute filter: faas.coldstart = true in trace explorer |
What this unlocks: Comparing p99 duration where faas.coldstart = true vs. false shows your true warm performance floor separate from cold start overhead, and reveals whether slow p99 is a cold start problem or a warm execution problem. These look the same in aggregate CloudWatch metrics.
Does OTel Itself Add to Cold Start Time?
Yes, and it’s worth accounting for. OpenTelemetry auto-instrumentation adds 200-800ms of initialization overhead, depending on how many libraries are instrumented and which exporter you use.
What to do:
- Instrument only what you need. Every auto-instrumented library is a library that the OTel SDK patches at startup. Don’t include instrument libraries that your function doesn’t actually use.
- Initialize OTel before other imports. If AWS SDK is imported before OTel has patched it, those calls won’t be instrumented.
- Use the ADOT layer instead of bundling OTel in your package. A larger deployment package means a slower cold start; keep your bundle lean and let the layer provide the SDK.
- For latency-critical functions, use Provisioned Concurrency. The execution environment stays warm, OTel initializes once at pre-warm time, and user-facing invocations never hit a cold start. Combine with sampling (OTEL_TRACES_SAMPLER_ARG=0.1 for 10%) to control per-invocation export overhead.
When Manual OTel Instrumentation Isn’t Enough
Wiring up faas.coldstart spans yourself works well for one or two functions. It starts to break down when:
- You have cold starts happening across a fleet of Lambda functions, and no single place to see which functions are the worst offenders
- A cold start triggered a cascade, slow initialization led to a downstream timeout, and you’re jumping between CloudWatch Logs and a separate trace viewer to reconstruct what happened
- You want to compare cold start duration before and after a dependency upgrade, but building that view in CloudWatch requires custom metric filters and manual dashboard work
- Your team is spending time on OTel plumbing instead of on the application itself
CubeAPM picks up where manual setup leaves off. It uses the OpenTelemetry Lambda layer, so there’s no proprietary agent and no code to maintain, and gives you cold start traces, init duration trends, and correlated logs in one place, self-hosted inside your own AWS account.
Summary
| What to do | Why |
| Set faas.coldstart on the root invocation span | Enables filtering cold vs. warm traces in any OTel backend |
| Use SimpleSpanProcessor | Prevents silent span loss on Lambda freeze |
| Record a module-level timestamp | Let’s you measure your own init code duration, not just the AWS REPORT line |
| Use the ADOT layer for quick setup | Auto-instruments AWS SDK calls and sets the cold start attribute with no code changes |
| Keep the OTel instrumentation scope tight | OTel itself adds to the cold start time; instrument only what you need |





