CubeAPM
CubeAPM CubeAPM

What Is OpenTelemetry and How Does It Work? 

What Is OpenTelemetry and How Does It Work? 

Table of Contents

OpenTelemetry (OTel) is an open-source, vendor-neutral framework for collecting observability data from distributed applications. It is a graduated CNCF project and, as of 2026 the second most active project in the CNCF ecosystem after Kubernetes. It standardizes how applications produce, collect, and export three types of telemetry data: traces, metrics, and logs. These are called signals. A fourth signal, continuous profiling, is in release candidate status as of Q1 2026 and targeting general availability in Q3 2026.

OpenTelemetry is not a monitoring backend, a database, or a dashboard tool. It is the pipeline layer that sits between your applications and whatever observability platform you use. You instrument your code once using OTel APIs and SDKs, and then route the data to any backend you choose without touching your application code again.

Key Takeaways

  • OpenTelemetry provides a single standard for generating, collecting, and exporting telemetry data across any language, infrastructure, or cloud.
  • The three core signals are traces, metrics, and logs. All three are stable across every major language SDK as of early 2026. Profiling is the fourth signal, currently in release candidate status and targeting GA in Q3 2026.
  • The main components are the API, the SDK, the OpenTelemetry Protocol (OTLP), and the Collector. Each has a distinct role and can be adopted independently.
  • The API and SDK are intentionally separate. Applications depend only on the API. The SDK implements the API and handles processing, sampling, and export.
  • The Collector is optional but strongly recommended for production. It acts as a vendor-neutral proxy that receives, processes, and routes telemetry to one or more backends.
  • OpenTelemetry does not store or visualize data. Storage and visualization are handled by the backend you choose: Prometheus, Jaeger, Grafana, Datadog, New Relic, or any other compatible platform.

The Problem OpenTelemetry Solves

Before OpenTelemetry, observability in distributed systems had two fundamental problems.

  • Vendor lock-in at the instrumentation layer: Every observability vendor shipped its own proprietary agent, SDK, and wire protocol. If you wanted traces in Datadog, you installed the Datadog SDK. If you then wanted to evaluate Honeycomb, you had to re-instrument your entire application from scratch. Teams ended up locked into whichever vendor they chose first, not because that vendor’s analysis capabilities were best, but because the cost of switching instrumentation was too high.
  • Data fragmentation across signals and languages: A trace from a Java service, metrics from a Go service, and logs from a Python service all used different attribute names, different conventions, and different formats. Correlating them required custom glue code that each team maintained independently. There was no shared standard for what a “service name” or “HTTP status code” attribute should be called.

OpenTelemetry solved both problems at once. It separates instrumentation from the backend, so you instrument your code once using the OTel API, and where the data goes becomes a configuration concern handled by the Collector or an exporter. 

Switching backends no longer requires touching application code. It also introduced semantic conventions, a shared standard for attribute names across all languages and all signals, making cross-service and cross-language correlation possible without glue code.

The Four Components

1. The API

The API defines the interfaces your application code calls to create spans, record metrics, and emit logs. It is language-specific and the only layer your application code directly depends on.

A key design decision: if no SDK is configured, API calls are no-ops. This means a library author can instrument their library with the OTel API without forcing any specific SDK or backend on the people using the library. The instrumentation is there when the SDK is present and invisible when it is not.

2. The SDK

The SDK implements the API. It handles sampling decisions, batching, processing, and exporting telemetry data. It is configured by the operator, not the application developer.

The SDK is where you configure:

  • Which sampler to use (always-on, probabilistic, or tail-based via the Collector)
  • Which exporter to use (OTLP to the Collector, direct to a backend, or a combination)
  • Resource attributes that identify the service (service name, version, environment)

3. OTLP (OpenTelemetry Protocol)

OTLP is the wire protocol that carries telemetry data between components. It runs over gRPC (port 4317) or HTTP/protobuf (port 4318). It is the standard language all OTel components use to communicate with each other.

When your application SDK exports data, it sends OTLP. When the Collector receives data, it receives OTLP. When the Collector forwards data to a backend, it can speak OTLP or translate to the backend’s native protocol (Prometheus remote write, Jaeger Thrift, etc.).

4. The Collector

The Collector is a standalone binary, deployed separately from your application, that receives, processes, and exports telemetry data. It is optional but strongly recommended for production environments.

A Collector pipeline has three composable stages:

Receivers accept incoming data. The most common is the OTLP receiver, but the Collector also supports Prometheus scraping, Jaeger, Zipkin, Fluent Bit, and host metrics collection.

Processors transform data in transit. Common processors include:

  • batch: groups telemetry into batches before export to reduce network overhead
  • filter: drops spans, metrics, or logs matching specific conditions
  • attributes: adds, removes, or modifies attributes on telemetry data
  • tail_sampling: defers sampling decisions until the full trace is assembled, allowing you to keep 100% of error traces while sampling only a fraction of successful ones

Exporters send processed data to one or more backends. A single Collector can fan out to multiple destinations simultaneously. You can send traces to Jaeger and to Datadog at the same time without changing your application code.

A minimal Collector configuration:

receivers:

  otlp:

    protocols:

      grpc:

        endpoint: 0.0.0.0:4317

      http:

        endpoint: 0.0.0.0:4318

processors:

  batch:

exporters:

  otlp:

    endpoint: your-backend:4317

service:

  pipelines:

    traces:

      receivers: [otlp]

      processors: [batch]

      exporters: [otlp]

    metrics:

      receivers: [otlp]

      processors: [batch]

      exporters: [otlp]

    logs:

      receivers: [otlp]

      processors: [batch]

      exporters: [otlp]

The Three Signals

Traces

A trace represents the journey of a single request as it flows through a distributed system. It is made up of spans. Each span represents one operation: a database query, an HTTP call, a queue publish, a function execution.

Spans carry:

  • A trace ID shared across all spans in the request
  • A span ID unique to that operation
  • A parent span ID linking child spans to their parent
  • A start time and duration
  • Attributes describing the operation (HTTP method, database name, status code)
  • Events marking specific moments within the span
  • A status (OK, Error, Unset)

The trace ID is what enables cross-service correlation. When a request enters Service A and Service A calls Service B, the trace ID is propagated in the request headers. Service B creates a child span under the same trace. This is called context propagation, and it is the mechanism that makes distributed tracing work. W3C TraceContext is the standard propagation format OpenTelemetry uses by default.

Tracing is the most mature OTel signal and reached stable status in 2021.

Metrics

Metrics are numerical measurements aggregated over time. They answer questions about system behavior at a population level: how many requests per second, what percentage errored, what the p99 latency was over the last five minutes.

OpenTelemetry supports three metric instrument types:

  • Counter: a value that only increases (total requests, total errors)
  • Histogram: a distribution of values (request duration, response size)
  • Gauge: a value that can go up or down at any point (current queue depth, active connections)

Metrics are far cheaper to store and query than traces. They are the right signal for dashboards, alerting thresholds, and SLO tracking. The OTel metrics specification is stable across all major language SDKs as of 2025.

Logs

Logs are timestamped text records with structured fields. OpenTelemetry’s approach to logs is intentionally different from traces and metrics: rather than replacing existing logging frameworks, OTel bridges them. You keep using your existing logger (Logback, log4j, zap, structlog) and the OTel SDK attaches as a bridge that adds trace context (trace ID, span ID) to each log entry automatically.

This means logs emitted during a traced request carry the same trace ID as the spans from that request. In a backend that supports log-trace correlation, you can jump from a slow span directly to the logs emitted during that span. This is the practical value of OTel’s unified context model.

Log SDK maturity varies by language but all major languages have reached stable status as of early 2026.

How Instrumentation Works

There are two ways to instrument an application with OpenTelemetry.

Auto-instrumentation uses a language-specific agent or wrapper that instruments supported frameworks without touching application code. You add it at startup time.

For Java:

java -javaagent:opentelemetry-javaagent.jar \

  -Dotel.service.name=payment-service \

  -Dotel.exporter.otlp.endpoint=http://collector:4317 \

  -jar your-application.jar

For Python:

OTEL_SERVICE_NAME=payment-service \

OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4317 \

opentelemetry-instrument python app.py

Auto-instrumentation covers HTTP clients and servers, database drivers, message queue clients, gRPC, and many popular frameworks. Java and .NET have the most complete coverage. Go currently relies more on manual instrumentation because the Go runtime does not support the bytecode manipulation that makes Java auto-instrumentation possible.

Manual instrumentation lets you add custom spans, attributes, and events to capture business-specific context that auto-instrumentation cannot know about:

from opentelemetry import trace

tracer = trace.get_tracer("payment-service")

def process_payment(order_id, amount):

    with tracer.start_as_current_span("payment.process") as span:

        span.set_attribute("order.id", order_id)

        span.set_attribute("payment.amount", amount)

        # your business logic

Both approaches can be combined. Use auto-instrumentation for framework-level visibility and add manual spans for the business logic that matters most.

Signal Maturity Status (as of May 2026)

SignalSpecificationSDK maturity (major languages)Production-ready
TracesStable (since 2021)StableYes
MetricsStableStable across all major languagesYes
LogsStableStable across major languagesYes
ProfilingRelease candidate (Q1 2026)Early stagePilot only; GA targeted Q3 2026

How OpenTelemetry Fits a Production Stack

A typical production deployment looks like this:

How OpenTelemetry Fits a Production Stack
What Is OpenTelemetry and How Does It Work?  3

In Kubernetes, the Collector is typically deployed as a DaemonSet (one Collector pod per node, receiving from all applications on that node) or as a sidecar (one Collector container per application pod). The Kubernetes Operator for OpenTelemetry manages Collector configuration via CRDs and handles auto-instrumentation injection at the namespace level without modifying application deployments.

OpenTelemetry vs Prometheus, Jaeger, and Other Tools

OpenTelemetry is frequently confused with the tools it works alongside.

ToolWhat it doesRelationship to OTel
PrometheusStores metrics, evaluates alert rulesOTel Collector can scrape Prometheus endpoints and export in Prometheus remote_write format
JaegerStores and visualizes tracesOTel Collector can export traces to Jaeger. Jaeger natively accepts OTLP
GrafanaVisualizes metrics, logs, and tracesQueries Prometheus, Tempo, Loki. Works well with OTel-collected data
Datadog / New RelicFull-stack observability platformsAccept OTLP natively. OTel instrumentation sends data to these platforms without proprietary agents
ZipkinDistributed tracing backendAccepts OTLP directly; the OTel Zipkin propagator was deprecated in February 2026 in favor of OTLP ingestion

OpenTelemetry is the collection and transport layer. The tools above are the storage and analysis layer. They are complementary, not competing.

OpenTelemetry Gets Data Out, CubeAPM Makes It Useful

OpenTelemetry solves the instrumentation and transportation problem well. Once your telemetry is flowing via OTLP, the remaining question is where it lands and how usefully it is related.

cubeapm-multi-agent-support

CubeAPM is built natively on OpenTelemetry and accepts OTLP directly from your existing OTel SDK or Collector without any proprietary agent or instrumentation change. If you are already instrumenting with OpenTelemetry, pointing your existing Collector’s OTLP exporter at CubeAPM is the only configuration change required. It correlates traces, metrics, logs, and infrastructure data using the shared context that OpenTelemetry provides, meaning you can jump from a slow span to its logs to the infrastructure metrics of the host that ran it, all within a single view. It runs self-hosted inside your own infrastructure at $0.15/GB ingestion pricing, so your telemetry data never leaves your environment.

Summary

OpenTelemetry is the open standard that separates observability instrumentation from observability backends. You instrument once with the OTel API and SDK, route data through the Collector using OTLP, and send it to any backend without changing application code. The three core signals, traces, metrics, and logs, are all stable and production-ready. Profiling is the fourth signal coming in Q3 2026. The framework does not store or visualize data; that is the responsibility of the backend you choose.

ComponentWhat it isWhat it does
APILanguage-specific interfacesWhat application code calls to create spans, record metrics, and emit logs
SDKAPI implementationHandles sampling, batching, processing, and export
OTLPWire protocol (gRPC port 4317, HTTP port 4318)Carries telemetry between SDK, Collector, and backends
CollectorStandalone binary (optional but recommended)Receives, processes, and routes telemetry to one or more backends
Semantic conventionsShared attribute naming standardEnsures consistent attribute names across languages and signals
Context propagationW3C TraceContext headersCarries trace IDs across service boundaries to link spans into traces

Disclaimer: Signal maturity status, component descriptions, and project details are verified against OpenTelemetry official documentation (opentelemetry.io, last modified April 6, 2026), the OpenTelemetry changelog, and CNCF project status as of May 2026.

Also read:

What RabbitMQ Monitoring Tools Work with Prometheus and Grafana? 

How to Monitor ActiveMQ with Prometheus 

What Are the Key ActiveMQ Metrics to Monitor for Performance? 

×
×