How Do I Trace AWS Lambda Cold Starts with OpenTelemetry?

Author: Vineet Chirania
Category: FAQ, Uncategorized
Published Date: May 1, 2026

You trace AWS Lambda cold starts with OpenTelemetry by setting the faas.coldstart boolean attribute on the root invocation span. When it’s true, the invocation triggered a cold start. When it’s false, the execution environment is already warm. Every OTel-compatible backend can then filter, compare, and alert on cold start latency separately from warm invocations, something CloudWatch alone cannot do.

Key Takeaways

faas.coldstart is the OTel FaaS semantic convention attribute that identifies cold start invocations. Set it on the root span, not a child span
You have two setup paths: the AWS-managed ADOT Lambda layer (no code changes) or manual OTel instrumentation (more control)
Never use BatchSpanProcessor in Lambda: spans will be silently lost when the execution environment freezes after your handler returns
Cold starts typically affect under 1% of Lambda invocations, but since AWS began billing the INIT phase separately in August 2025, cold start duration is now a direct cost item, not just a latency concern
OTel auto-instrumentation can itself add 200-800ms to cold start time; keep instrumentation scope tight

Option 1: The ADOT Lambda Layer (Fastest Path)

The AWS Distro for OpenTelemetry (ADOT) Lambda layer is an AWS-managed layer that bundles the OTel SDK and a Collector extension. It sets faas.coldstart automatically — no code changes required.

What to do:

Step 1. Add the ADOT layer ARN for your runtime and region. Find the current ARNs at aws-otel.github.io.

Step 2. Set the exec wrapper environment variable:

AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-instrument

Step 3. Point it at your OTLP endpoint:

OTEL_EXPORTER_OTLP_ENDPOINT=https://your-apm-or-collector:4318

OTEL_SERVICE_NAME=my-lambda-function

What you get automatically: AWS SDK calls (DynamoDB, S3, SQS, SNS), outbound HTTP requests, root invocation spans with faas.coldstart set correctly, and W3C TraceContext propagation for distributed tracing.

The tradeoff: The layer itself adds 50–150ms to cold start time. For most functions, that’s acceptable. For extremely latency-sensitive functions, manual instrumentation lets you control exactly what gets loaded during init.

Option 2: Manual Instrumentation

Manual instrumentation gives you the most flexibility, particularly the ability to measure exactly how long your own initialization code took, broken down by phase if you want it.

The pattern: Record a module-level timestamp and a boolean at file load time (which runs during cold start). On the first handler invocation, compute the elapsed time and set faas.coldstart = true. On every subsequent invocation in the same container, faas.coldstart = false.

Python:

# handler.py
import os, time
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import SimpleSpanProcessor  # not Batch — see below
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.trace import SpanKind

# Module-level: runs once per execution environment (i.e., on every cold start)
_INIT_TIME = time.time()
_IS_COLD_START = True

provider = TracerProvider(resource=Resource.create({
    "service.name": os.environ["AWS_LAMBDA_FUNCTION_NAME"],
    "faas.name": os.environ["AWS_LAMBDA_FUNCTION_NAME"],
    "faas.version": os.environ["AWS_LAMBDA_FUNCTION_VERSION"],
    "cloud.provider": "aws",
    "cloud.platform": "aws_lambda",
    "cloud.region": os.environ["AWS_REGION"],
}))
provider.add_span_processor(
    SimpleSpanProcessor(OTLPSpanExporter(
        endpoint=os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] + "/v1/traces"
    ))
)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)

def handler(event, context):
    global _IS_COLD_START
    with tracer.start_as_current_span("lambda.invoke", kind=SpanKind.SERVER) as span:
        span.set_attributes({
            "faas.coldstart": _IS_COLD_START,
            "faas.invocation_id": context.aws_request_id,
            "faas.trigger": "http",  # adjust per trigger: http | pubsub | datasource | timer
        })
        if _IS_COLD_START:
            init_ms = int((time.time() - _INIT_TIME) * 1000)
            span.set_attribute("faas.init_duration_ms", init_ms)
            span.add_event("cold_start", {"init_duration_ms": init_ms})
            _IS_COLD_START = False
        # your handler logic here

# handler.py
import os, time
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import SimpleSpanProcessor  # not Batch — see below
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.trace import SpanKind

# Module-level: runs once per execution environment (i.e., on every cold start)
_INIT_TIME = time.time()
_IS_COLD_START = True

provider = TracerProvider(resource=Resource.create({
    "service.name": os.environ["AWS_LAMBDA_FUNCTION_NAME"],
    "faas.name": os.environ["AWS_LAMBDA_FUNCTION_NAME"],
    "faas.version": os.environ["AWS_LAMBDA_FUNCTION_VERSION"],
    "cloud.provider": "aws",
    "cloud.platform": "aws_lambda",
    "cloud.region": os.environ["AWS_REGION"],
}))
provider.add_span_processor(
    SimpleSpanProcessor(OTLPSpanExporter(
        endpoint=os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] + "/v1/traces"
    ))
)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)

def handler(event, context):
    global _IS_COLD_START
    with tracer.start_as_current_span("lambda.invoke", kind=SpanKind.SERVER) as span:
        span.set_attributes({
            "faas.coldstart": _IS_COLD_START,
            "faas.invocation_id": context.aws_request_id,
            "faas.trigger": "http",  # adjust per trigger: http | pubsub | datasource | timer
        })
        if _IS_COLD_START:
            init_ms = int((time.time() - _INIT_TIME) * 1000)
            span.set_attribute("faas.init_duration_ms", init_ms)
            span.add_event("cold_start", {"init_duration_ms": init_ms})
            _IS_COLD_START = False
        # your handler logic here

Node.js:

// handler.js
const { trace, SpanKind } = require('@opentelemetry/api');
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { Resource } = require('@opentelemetry/resources');

// Module-level: runs once on cold start
const initTime = Date.now();
let isColdStart = true;

const provider = new NodeTracerProvider({
  resource: new Resource({
    'service.name': process.env.AWS_LAMBDA_FUNCTION_NAME,
    'faas.name': process.env.AWS_LAMBDA_FUNCTION_NAME,
    'faas.version': process.env.AWS_LAMBDA_FUNCTION_VERSION,
    'cloud.provider': 'aws',
    'cloud.platform': 'aws_lambda',
    'cloud.region': process.env.AWS_REGION,
  }),
});
provider.addSpanProcessor(
  new SimpleSpanProcessor(
    new OTLPTraceExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/traces' })
  )
);
provider.register();
const tracer = trace.getTracer('lambda-tracer');

exports.handler = async (event, context) => {
  return tracer.startActiveSpan('lambda.invoke', { kind: SpanKind.SERVER }, async (span) => {
    span.setAttribute('faas.coldstart', isColdStart);
    span.setAttribute('faas.invocation_id', context.awsRequestId);
    if (isColdStart) {
      const initMs = Date.now() - initTime;
      span.setAttribute('faas.init_duration_ms', initMs);
      span.addEvent('cold_start', { init_duration_ms: initMs });
      isColdStart = false;
    }
    // your handler logic here
    span.end();
  });
};

// handler.js
const { trace, SpanKind } = require('@opentelemetry/api');
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { Resource } = require('@opentelemetry/resources');

// Module-level: runs once on cold start
const initTime = Date.now();
let isColdStart = true;

const provider = new NodeTracerProvider({
  resource: new Resource({
    'service.name': process.env.AWS_LAMBDA_FUNCTION_NAME,
    'faas.name': process.env.AWS_LAMBDA_FUNCTION_NAME,
    'faas.version': process.env.AWS_LAMBDA_FUNCTION_VERSION,
    'cloud.provider': 'aws',
    'cloud.platform': 'aws_lambda',
    'cloud.region': process.env.AWS_REGION,
  }),
});
provider.addSpanProcessor(
  new SimpleSpanProcessor(
    new OTLPTraceExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/traces' })
  )
);
provider.register();
const tracer = trace.getTracer('lambda-tracer');

exports.handler = async (event, context) => {
  return tracer.startActiveSpan('lambda.invoke', { kind: SpanKind.SERVER }, async (span) => {
    span.setAttribute('faas.coldstart', isColdStart);
    span.setAttribute('faas.invocation_id', context.awsRequestId);
    if (isColdStart) {
      const initMs = Date.now() - initTime;
      span.setAttribute('faas.init_duration_ms', initMs);
      span.addEvent('cold_start', { init_duration_ms: initMs });
      isColdStart = false;
    }
    // your handler logic here
    span.end();
  });
};

Practical note: faas.init_duration_ms here measures the time between module load and first handler invocation – that is, your code’s initialization time. It does not include container provisioning, which happens before your code runs and is invisible to instrumentation.

The BatchSpanProcessor Mistake

This is the most common OTel + Lambda error, and it causes silent data loss with no error message.

BatchSpanProcessor queues spans in memory and flushes them on a background timer. In a long-running service, this is the right choice. In Lambda, when your handler returns, the execution environment freezes immediately. Background threads pause. The flush timer never fires. Any spans still in the queue are lost.

Always use SimpleSpanProcessor in Lambda:

# Correct

from opentelemetry.sdk.trace.export import SimpleSpanProcessor

provider.add_span_processor(SimpleSpanProcessor(exporter))

# Wrong: spans will be silently dropped on freeze

from opentelemetry.sdk.trace.export import BatchSpanProcessor

provider.add_span_processor(BatchSpanProcessor(exporter))

The exception: If you’re using the ADOT layer’s Collector extension, the extension runs in the Lambda extension lifecycle and receives a shutdown signal before the environment freezes, so it can flush correctly. But any instrumentation you add yourself should still use SimpleSpanProcessor.

Querying Cold Start Traces in Your APM

Once faas.coldstart is flowing through your traces, you can filter by it in any OTel-compatible backend:

Backend	Query
Jaeger	Tag filter: faas.coldstart = true
Grafana Tempo / TraceQL	{ span.faas.coldstart = true }
CubeAPM	Span attribute filter: faas.coldstart = true in trace explorer

What this unlocks: Comparing p99 duration where faas.coldstart = true vs. false shows your true warm performance floor separate from cold start overhead, and reveals whether slow p99 is a cold start problem or a warm execution problem. These look the same in aggregate CloudWatch metrics.

Does OTel Itself Add to Cold Start Time?

Yes, and it’s worth accounting for. OpenTelemetry auto-instrumentation adds 200-800ms of initialization overhead, depending on how many libraries are instrumented and which exporter you use.

What to do:

Instrument only what you need. Every auto-instrumented library is a library that the OTel SDK patches at startup. Don’t include instrument libraries that your function doesn’t actually use.
Initialize OTel before other imports. If AWS SDK is imported before OTel has patched it, those calls won’t be instrumented.
Use the ADOT layer instead of bundling OTel in your package. A larger deployment package means a slower cold start; keep your bundle lean and let the layer provide the SDK.
For latency-critical functions, use Provisioned Concurrency. The execution environment stays warm, OTel initializes once at pre-warm time, and user-facing invocations never hit a cold start. Combine with sampling (OTEL_TRACES_SAMPLER_ARG=0.1 for 10%) to control per-invocation export overhead.

When Manual OTel Instrumentation Isn’t Enough

Wiring up faas.coldstart spans yourself works well for one or two functions. It starts to break down when:

You have cold starts happening across a fleet of Lambda functions, and no single place to see which functions are the worst offenders
A cold start triggered a cascade, slow initialization led to a downstream timeout, and you’re jumping between CloudWatch Logs and a separate trace viewer to reconstruct what happened
You want to compare cold start duration before and after a dependency upgrade, but building that view in CloudWatch requires custom metric filters and manual dashboard work
Your team is spending time on OTel plumbing instead of on the application itself

CubeAPM picks up where manual setup leaves off. It uses the OpenTelemetry Lambda layer, so there’s no proprietary agent and no code to maintain, and gives you cold start traces, init duration trends, and correlated logs in one place, self-hosted inside your own AWS account.

Summary

What to do	Why
Set faas.coldstart on the root invocation span	Enables filtering cold vs. warm traces in any OTel backend
Use SimpleSpanProcessor	Prevents silent span loss on Lambda freeze
Record a module-level timestamp	Let’s you measure your own init code duration, not just the AWS REPORT line
Use the ADOT layer for quick setup	Auto-instruments AWS SDK calls and sets the cold start attribute with no code changes
Keep the OTel instrumentation scope tight	OTel itself adds to the cold start time; instrument only what you need