Datadog LLM Observability: Examples, Pricing, and Alternatives in 2026

As LLM-powered features and AI agents move into production, teams need visibility into more than just whether the service is up. You need to know how long each LLM call takes, how many tokens it consumed, what it cost, whether the output was accurate, and whether a prompt injection was attempted. This is the problem Datadog LLM Observability was built to solve, and it is now one of the most widely deployed enterprise solutions for monitoring AI applications in production.

This guide covers how Datadog LLM Observability works, what it monitors with examples, and the best Datadog alternatives for teams looking for open-source, self-hosted, or more cost-predictable options.

Key Takeaways

Datadog markets its LLM observability product under two names: LLM Observability in product documentation and navigation, and Agent Observability on a newer product page covering AI agent workflows. Both refer to the same product.
Pricing confirmed from Datadog’s LLM Observability product page: Free plan includes 40,000 LLM spans/month at no cost; Pro plan starts at $160/month for 100,000 LLM spans, with additional on-demand usage billed beyond that. Only LLM spans are billed; tool spans, embedding spans, retrieval spans, and agent spans are not charged.
Both Free and Pro plans retain trace and span data for 15 days by default. Retention add-ons are billed per 10,000 LLM spans.
Datadog supports estimated cost tracking for 800+ models from providers including OpenAI, Anthropic, Gemini, Hugging Face, and models served via OpenRouter.
Langfuse is the most widely adopted open-source LLM observability platform, now part of ClickHouse (acquired January 2026). Its Hobby tier is free with 50,000 observations/month and 30-day retention; Core starts at $29/month.
SigNoz is an OpenTelemetry-native observability platform with dedicated LLM observability support using OTel GenAI semantic conventions. The self-hosted Community Edition is free under the MIT Expat license.
CubeAPM is a self-hosted, OpenTelemetry-native APM platform that accepts LLM application telemetry via OTel GenAI instrumentation alongside the full application and infrastructure stack, at $0.15/GB ingested with no per-span or per-request fees.

What is Datadog LLM Observability?

Datadog LLM Observability, also marketed as Agent Observability, provides end-to-end monitoring, evaluation, and improvement tooling for LLM-powered applications and AI agents. Each request fulfilled by your application is represented as a trace. A trace captures every step: the initial prompt, retrieval steps, tool calls, model responses, and any postprocessing.

The product covers four distinct workflows:

Monitoring: Track latency, token usage, cost, errors, and quality metrics in production. Out-of-the-box evaluations surface hallucinations, prompt injection attempts, PII exposure, and sentiment drift automatically. The Insights view detects anomalies across key operational dimensions.
Evaluation: Run LLM-as-a-judge evaluators, heuristic evaluators, and human annotation workflows on production traces or datasets before release. Every plan includes the full evaluation workflow at no additional charge. If an eval run makes LLM calls, those calls count as LLM spans.
Experimentation: Build versioned datasets from production traces, run experiments comparing prompts, models, and agent configurations side by side, and validate changes with real production data before deploying.
Context unification: Correlate LLM agent behavior with backend APM services, infrastructure signals, and RUM sessions in the same platform.

How Datadog LLM Observability works: examples

Example 1: Tracing a chatbot request

A user submits a question to a customer support chatbot. Datadog traces the full execution:

Span 1 (LLM): The user’s message is sent to OpenAI gpt-4o. Datadog records the prompt, token counts (input: 312, output: 184), latency (1.2s), estimated cost calculated from OpenAI’s public pricing, and the model response.
Span 2 (Tool): The model calls a knowledge base retrieval tool. Tool spans are not billed.
Span 3 (LLM): The retrieved context is sent to the model for a second call. Datadog records this as a second LLM span.

The full trace appears in the LLM Observability page with a waterfall view of all spans, their latencies, token counts, and costs.

Example 2: Automated cost tracking across models

As documented on Datadog’s LLM cost monitoring page, Datadog automatically calculates the estimated cost for each LLM request using providers’ public pricing models and token counts. It supports 800+ models across OpenAI, Anthropic, Gemini, Hugging Face, and models served via OpenRouter. Cost metrics ship with out-of-the-box tags including model_name, model_provider, and ml_app. Teams can break down LLM spend by custom tags such as team, customer tier, or feature.

Example 3: Quality evaluation with LLM-as-a-judge

Teams configure automated evaluators that run on every production trace. Out-of-the-box evaluators cover hallucination detection, prompt injection detection, PII exposure, and response quality. Custom LLM-as-a-judge evaluators, generally available as of late 2025, let teams define domain-specific quality criteria using any supported provider (OpenAI, Anthropic, Azure OpenAI, or Amazon Bedrock). Evaluation results appear alongside trace data.

Example 4: Sensitive data scanning

Datadog’s Sensitive Data Scanner is built into LLM Observability at no separate cost. For every 10,000 LLM requests, teams receive an allocation of 1 GB of Sensitive Data Scanner capacity. The scanner identifies and redacts PII, financial data, health records, and other sensitive content from prompts and responses.

Datadog LLM Observability pricing

Confirmed from Datadog’s LLM Observability product page:

What counts as a billable LLM span: Each call to an LLM provider is captured as one LLM span. Tool spans, embedding spans, retrieval spans, and agent spans are not billed. Pricing scales on model calls only, not on the surrounding agent complexity.

Plan	Price	Included LLM spans	Retention	Evaluations
Free	$0/month	40,000/month	15 days	Full workflow included
Pro	$160/month	100,000/month	15 days	Full workflow included
Pro (additional)	On-demand beyond 100K	Per 10K spans	Retention add-ons available	No separate eval fee
M2M / Annual	Discounted	Custom	15 days traces	Full workflow included

Retention add-ons extend trace and span data beyond 15 days. M2M and annual commitment plans are discounted. Datadog notes that pricing varies by region.

Important: Datadog introduced new LLM Observability pricing effective May 1, 2026. Always verify current rates directly on Datadog’s pricing page before budgeting.

Instrumentation

Datadog supports LLM application instrumentation via:

Python SDK (ddtrace-run with DD_LLMOBS_ENABLED=1)
Node.js SDK (dd-trace with DD_LLMOBS_ENABLED=1)
Java agent (dd-java-agent.jar with -Ddd.llmobs.enabled=true)
OpenTelemetry via OTLP for teams using the OTel pipeline
HTTP API for languages without a native SDK
Auto-instrumentation for LangChain, CrewAI, Pydantic AI, Strands Agents, AWS Bedrock, LiteLLM, and others via the Python SDK

Limitations of Datadog LLM Observability

Span-based pricing at scale: High-volume AI applications, where a single user interaction triggers many model calls, will accumulate spans quickly. The 40,000 free spans per month can be consumed by a modest-traffic application in a day or two.
15-day default retention: Both Free and Pro plans retain trace and span data for only 15 days. Teams that need longer retention for compliance or trend analysis must purchase add-ons.
Data leaves your infrastructure: All LLM trace data, including prompts and responses, is sent to Datadog’s SaaS platform. Teams with strict data residency or sensitive prompt content requirements need to evaluate whether Datadog’s regional data options and Sensitive Data Scanner meet their compliance needs.
Requires existing Datadog investment: LLM Observability works best for teams already using Datadog for APM, infrastructure, and logs.

Alternatives to Datadog LLM Observability

1. CubeAPM

CubeAPM is a self-hosted, OpenTelemetry-native, full-stack observability platform. It does not have a dedicated LLM observability product, but because it is built natively on OpenTelemetry and accepts all OTLP telemetry, it receives LLM application traces instrumented with OTel GenAI semantic conventions (gen_ai.* attributes) alongside the rest of your application stack.

Features

Accepts OTLP traces from LLM applications instrumented with OTel GenAI instrumentation libraries
LLM application spans appear alongside APM service traces: navigate from a slow LLM response to the upstream service that triggered it and the infrastructure it ran on
Full MELT observability (Metrics, Events, Logs, Traces) in a single platform; LLM trace data, application logs, and infrastructure metrics are correlated in one view
Smart sampling retains high-latency and error traces while reducing storage costs
Self-hosted inside your VPC; no prompt data, trace data, or LLM response content leaves your infrastructure
SOC 2 and ISO 27001 compliant
Unlimited retention

Pricing: $0.15/GB of data ingested. No per-span, per-request, per-LLM-call, or per-host fees.

Limitations: CubeAPM does not have dedicated LLM observability features such as built-in LLM-as-a-judge evaluations, prompt management, automatic model cost calculation, or hallucination detection. It is a general observability platform. Teams that need those LLM-specific workflows should pair CubeAPM with a dedicated LLM evaluation platform like Langfuse.

Best for: Teams that want LLM application traces correlated with their full application and infrastructure stack in a single self-hosted platform, with predictable ingestion-based pricing that does not scale with LLM request or span volume.

2. Langfuse

langfuse llm observability — Datadog LLM Observability: Examples, Pricing, and Alternatives in 2026 4

Langfuse is the most widely adopted open-source LLM engineering platform, now part of ClickHouse following its acquisition in January 2026. It has 29.5k+ GitHub stars and continues to be developed by the same team with additional ClickHouse resources.

Features

Hierarchical traces capturing every LLM call, tool invocation, and retrieval step with timing, inputs, outputs, and metadata
Session tracking for multi-turn conversations and agentic workflows; user tracking for per-user cost and usage breakdowns
Token and cost tracking: automatically infers cost from model and usage details for OpenAI, Anthropic, Google, and other supported models
Agent graph visualization for complex agentic workflows
Prompt management: version control, release management, one-click deployments and rollbacks, playground, composability, and caching
Evaluations: LLM-as-a-judge evaluators, heuristic code evaluators, human annotation queues, user feedback tracking, and dataset-based experiments via SDK and UI
Monitors and alerts: launched June 2026, providing production monitoring with configurable alert thresholds
OpenTelemetry-native: the OTel-native Langfuse SDK v4 supports gen_ai.* attributes and known LLM instrumentors; supports Java, Go, and custom OTel via the OpenTelemetry endpoint
Integration via Python/JS SDKs, 100+ library/framework integrations, OpenTelemetry, or HTTP API
Self-hostable via Docker Compose, Kubernetes (Helm), AWS, GCP, or Azure Terraform
MIT licensed; SOC2 Type II and ISO 27001 certified on paid plans; HIPAA available on Enterprise; data regions in the US, EU, and Japan

Pricing:

Plan	Price	Included units/month	Data retention	Users
Hobby	Free	50,000	30 days	2
Core	$29/month	100,000	90 days	Unlimited
Pro	$199/month	100,000	3 years	Unlimited
Enterprise	$2,499/month	100,000	3 years	Unlimited
Additional usage (all paid plans)	$8/100,000 units	Volume discounts available

A “unit” is one observation (a span, generation, event, or score). Self-hosted Langfuse is priced separately.

Limitations: Langfuse is focused on LLM application observability and evaluation; it does not provide general APM, infrastructure monitoring, or backend service tracing. Teams that want LLM traces correlated with broader service health need to pair it with a general observability platform.

Best for: AI engineering teams that want the most mature open-source LLM observability platform covering tracing, prompt management, evaluations, and experiments, with a genuine free tier and self-hosting option.

3. SigNoz

SigNoz is an OpenTelemetry-native, open-source observability platform with dedicated LLM observability support. Its LLM observability docs cover integrations with LangChain, LlamaIndex, LiteLLM, OpenAI, Anthropic, Gemini, Vercel AI SDK, Pydantic AI, and others using OTel GenAI semantic conventions.

Features

Traces every LLM call and agentic step using OTel GenAI semantic conventions; prompt content captured when OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true
Pre-built LLM dashboards for OpenAI, Gemini, LiteLLM, and others
Token usage and cost visibility from OTel span attributes
LLM traces correlated with full-stack APM: link a slow model response to the FastAPI route, database query, or Kubernetes pod that caused the delay
Distributed tracing, log management, metrics, and alerting in the same platform as LLM monitoring
Available as self-hosted Community Edition (free, MIT Expat) and managed SigNoz Cloud with a 30-day free trial

Pricing:

Community Edition: Free; self-hosted; MIT Expat license
SigNoz Cloud: $49/month base (includes usage up to $49); additional usage at $0.30/GB for logs and traces, $0.10 per million metric samples; no per-seat or per-LLM-span fees

Limitations: SigNoz does not have built-in LLM-as-a-judge evaluations, prompt management, or dedicated hallucination detection. LLM observability in SigNoz is delivered through the same trace and metrics pipeline as the rest of your application, which means you get correlation but not the LLM-specific workflows that Langfuse provides.

Best for: OpenTelemetry-first engineering teams that want LLM application traces correlated with full-stack APM, logs, and infrastructure in a single open-source platform, with no per-LLM-span charges.

Comparison table

Tool	LLM trace support	Built-in evaluations	Prompt management	Cost tracking	Self-hosted	Free tier	Pricing model
Datadog LLM Observability	Yes (800+ models)	Yes (LLM-as-a-judge, OOTB evals)	No	Yes (auto, 800+ models)	No	Yes (40K spans/mo)	Per LLM span ($160/mo Pro)
CubeAPM	Yes (via OTel GenAI)	No	No	No (raw span data only)	Yes	No	$0.15/GB ingested
Langfuse	Yes (100+ integrations)	Yes (LLM-as-a-judge, human annotation)	Yes	Yes (inferred from model)	Yes (MIT)	Yes (50K obs/mo)	Per observation ($29/mo Core)
SigNoz	Yes (OTel GenAI)	No	No	Via span attributes	Yes (MIT Expat)	Yes (Community Edition)	$49/mo Cloud base

Which LLM observability tool should you choose?

Choose Datadog LLM Observability if your team already uses Datadog for APM and infrastructure, and you need built-in LLM-as-a-judge evaluations, automatic cost tracking across 800+ models, prompt injection detection, and sensitive data scanning in a single managed platform. The Free plan with 40,000 spans/month is a reasonable starting point for smaller deployments.
Choose CubeAPM if you want LLM application traces visible alongside your full application and infrastructure stack in a single self-hosted platform, with predictable ingestion-based pricing that does not scale with LLM request volume, and where all prompt and response data stays inside your own VPC.
Choose Langfuse if you want the most mature open-source LLM engineering platform with tracing, prompt versioning, evaluations, and experiment management. It is MIT licensed, self-hostable, has the widest framework coverage, and now includes monitors and alerts. Its acquisition by ClickHouse in January 2026 adds engineering resources and long-term stability.
Choose SigNoz if you want LLM traces correlated with full-stack APM, logs, and infrastructure metrics in a single open-source, self-hosted platform using OTel GenAI standard instrumentation, with no per-LLM-span pricing.

Summary

Datadog LLM Observability is the most feature-complete managed platform for LLM application monitoring in production, covering tracing, cost estimation, evaluations, experimentation, and sensitive data scanning in one place. Its Free plan includes 40,000 LLM spans per month at no cost; the Pro plan starts at $160/month for 100,000 LLM spans with 15-day default retention. New pricing took effect May 1, 2026 — verify current rates directly with Datadog before budgeting.

For teams that want open-source or self-hosted alternatives, Langfuse is the most mature dedicated LLM observability platform with a genuine free tier, MIT license, and the broadest framework support. SigNoz provides LLM trace correlation alongside full-stack APM using OTel standards without per-span pricing. CubeAPM provides the most cost-predictable full-stack option for teams that want LLM traces alongside application and infrastructure data in a self-hosted platform at $0.15/GB with no per-span fees.

Tool	Best for	Free tier	Self-hosted
Datadog LLM Observability	Managed enterprise LLM monitoring, built-in evals and cost tracking	Yes (40K spans/mo)	No
CubeAPM	Full-stack LLM + APM correlation, self-hosted, ingestion-based pricing	No	Yes
Langfuse	Open-source LLM engineering platform, prompt management, evaluations	Yes (50K obs/mo)	Yes (MIT)
SigNoz	OTel-native full-stack + LLM observability, no per-span pricing	Yes (Community Edition)	Yes (MIT Expat)

Disclaimer: Datadog LLM Observability pricing is confirmed from Datadog’s LLM Observability product page and Datadog’s LLM cost documentation as of June 2026. New pricing effective May 1, 2026; verify current rates at Datadog’s pricing page. The pricing for Langfuse, SigNoz, and CubeAPM is also taken from their pricing pages. Always verify current pricing and features directly with each vendor before making decisions.

Also read:

What Are the Best Frontend Performance Monitoring Tools for Angular in 2026?

What Are the Best Frontend Performance Monitoring Tools for Next.js in 2026?

What Are the Best Frontend Performance Monitoring Tools for Vue.js in 2026?

Datadog LLM Observability: Examples, Pricing, and Alternatives in 2026

Table of Contents

Key Takeaways

What is Datadog LLM Observability?

How Datadog LLM Observability works: examples

Example 1: Tracing a chatbot request

Example 2: Automated cost tracking across models

Example 3: Quality evaluation with LLM-as-a-judge

Example 4: Sensitive data scanning

Datadog LLM Observability pricing

Instrumentation

Limitations of Datadog LLM Observability

Alternatives to Datadog LLM Observability

1. CubeAPM

Features

2. Langfuse

Features

3. SigNoz

Features

Comparison table

Which LLM observability tool should you choose?

Summary

Features

Resources

Links