As LLM-powered features and AI agents move into production, teams need visibility into more than just whether the service is up. You need to know how long each LLM call takes, how many tokens it consumed, what it cost, whether the output was accurate, and whether a prompt injection was attempted. This is the problem Datadog LLM Observability was built to solve, and it is now one of the most widely deployed enterprise solutions for monitoring AI applications in production.
This guide covers how Datadog LLM Observability works, what it monitors with examples, and the best Datadog alternatives for teams looking for open-source, self-hosted, or more cost-predictable options.
Key Takeaways
- Datadog markets its LLM observability product under two names: LLM Observability in product documentation and navigation, and Agent Observability on a newer product page covering AI agent workflows. Both refer to the same product.
- Pricing confirmed from Datadog’s LLM Observability product page: Free plan includes 40,000 LLM spans/month at no cost; Pro plan starts at $160/month for 100,000 LLM spans, with additional on-demand usage billed beyond that. Only LLM spans are billed; tool spans, embedding spans, retrieval spans, and agent spans are not charged.
- Both Free and Pro plans retain trace and span data for 15 days by default. Retention add-ons are billed per 10,000 LLM spans.
- Datadog supports estimated cost tracking for 800+ models from providers including OpenAI, Anthropic, Gemini, Hugging Face, and models served via OpenRouter.
- Langfuse is the most widely adopted open-source LLM observability platform, now part of ClickHouse (acquired January 2026). Its Hobby tier is free with 50,000 observations/month and 30-day retention; Core starts at $29/month.
- SigNoz is an OpenTelemetry-native observability platform with dedicated LLM observability support using OTel GenAI semantic conventions. The self-hosted Community Edition is free under the MIT Expat license.
- CubeAPM is a self-hosted, OpenTelemetry-native APM platform that accepts LLM application telemetry via OTel GenAI instrumentation alongside the full application and infrastructure stack, at $0.15/GB ingested with no per-span or per-request fees.
What is Datadog LLM Observability?
Datadog LLM Observability, also marketed as Agent Observability, provides end-to-end monitoring, evaluation, and improvement tooling for LLM-powered applications and AI agents. Each request fulfilled by your application is represented as a trace. A trace captures every step: the initial prompt, retrieval steps, tool calls, model responses, and any postprocessing.
The product covers four distinct workflows:
- Monitoring: Track latency, token usage, cost, errors, and quality metrics in production. Out-of-the-box evaluations surface hallucinations, prompt injection attempts, PII exposure, and sentiment drift automatically. The Insights view detects anomalies across key operational dimensions.
- Evaluation: Run LLM-as-a-judge evaluators, heuristic evaluators, and human annotation workflows on production traces or datasets before release. Every plan includes the full evaluation workflow at no additional charge. If an eval run makes LLM calls, those calls count as LLM spans.
- Experimentation: Build versioned datasets from production traces, run experiments comparing prompts, models, and agent configurations side by side, and validate changes with real production data before deploying.
- Context unification: Correlate LLM agent behavior with backend APM services, infrastructure signals, and RUM sessions in the same platform.
How Datadog LLM Observability works: examples
Example 1: Tracing a chatbot request
A user submits a question to a customer support chatbot. Datadog traces the full execution:
- Span 1 (LLM): The user’s message is sent to OpenAI gpt-4o. Datadog records the prompt, token counts (input: 312, output: 184), latency (1.2s), estimated cost calculated from OpenAI’s public pricing, and the model response.
- Span 2 (Tool): The model calls a knowledge base retrieval tool. Tool spans are not billed.
- Span 3 (LLM): The retrieved context is sent to the model for a second call. Datadog records this as a second LLM span.
The full trace appears in the LLM Observability page with a waterfall view of all spans, their latencies, token counts, and costs.
Example 2: Automated cost tracking across models
As documented on Datadog’s LLM cost monitoring page, Datadog automatically calculates the estimated cost for each LLM request using providers’ public pricing models and token counts. It supports 800+ models across OpenAI, Anthropic, Gemini, Hugging Face, and models served via OpenRouter. Cost metrics ship with out-of-the-box tags including model_name, model_provider, and ml_app. Teams can break down LLM spend by custom tags such as team, customer tier, or feature.
Example 3: Quality evaluation with LLM-as-a-judge
Teams configure automated evaluators that run on every production trace. Out-of-the-box evaluators cover hallucination detection, prompt injection detection, PII exposure, and response quality. Custom LLM-as-a-judge evaluators, generally available as of late 2025, let teams define domain-specific quality criteria using any supported provider (OpenAI, Anthropic, Azure OpenAI, or Amazon Bedrock). Evaluation results appear alongside trace data.
Example 4: Sensitive data scanning
Datadog’s Sensitive Data Scanner is built into LLM Observability at no separate cost. For every 10,000 LLM requests, teams receive an allocation of 1 GB of Sensitive Data Scanner capacity. The scanner identifies and redacts PII, financial data, health records, and other sensitive content from prompts and responses.
Datadog LLM Observability pricing
Confirmed from Datadog’s LLM Observability product page:
What counts as a billable LLM span: Each call to an LLM provider is captured as one LLM span. Tool spans, embedding spans, retrieval spans, and agent spans are not billed. Pricing scales on model calls only, not on the surrounding agent complexity.
| Plan | Price | Included LLM spans | Retention | Evaluations |
| Free | $0/month | 40,000/month | 15 days | Full workflow included |
| Pro | $160/month | 100,000/month | 15 days | Full workflow included |
| Pro (additional) | On-demand beyond 100K | Per 10K spans | Retention add-ons available | No separate eval fee |
| M2M / Annual | Discounted | Custom | 15 days traces | Full workflow included |
Retention add-ons extend trace and span data beyond 15 days. M2M and annual commitment plans are discounted. Datadog notes that pricing varies by region.
Important: Datadog introduced new LLM Observability pricing effective May 1, 2026. Always verify current rates directly on Datadog’s pricing page before budgeting.
Instrumentation
Datadog supports LLM application instrumentation via:
- Python SDK (ddtrace-run with DD_LLMOBS_ENABLED=1)
- Node.js SDK (dd-trace with DD_LLMOBS_ENABLED=1)
- Java agent (dd-java-agent.jar with -Ddd.llmobs.enabled=true)
- OpenTelemetry via OTLP for teams using the OTel pipeline
- HTTP API for languages without a native SDK
- Auto-instrumentation for LangChain, CrewAI, Pydantic AI, Strands Agents, AWS Bedrock, LiteLLM, and others via the Python SDK
Limitations of Datadog LLM Observability
- Span-based pricing at scale: High-volume AI applications, where a single user interaction triggers many model calls, will accumulate spans quickly. The 40,000 free spans per month can be consumed by a modest-traffic application in a day or two.
- 15-day default retention: Both Free and Pro plans retain trace and span data for only 15 days. Teams that need longer retention for compliance or trend analysis must purchase add-ons.
- Data leaves your infrastructure: All LLM trace data, including prompts and responses, is sent to Datadog’s SaaS platform. Teams with strict data residency or sensitive prompt content requirements need to evaluate whether Datadog’s regional data options and Sensitive Data Scanner meet their compliance needs.
- Requires existing Datadog investment: LLM Observability works best for teams already using Datadog for APM, infrastructure, and logs.
Alternatives to Datadog LLM Observability
1. CubeAPM

CubeAPM is a self-hosted, OpenTelemetry-native, full-stack observability platform. It does not have a dedicated LLM observability product, but because it is built natively on OpenTelemetry and accepts all OTLP telemetry, it receives LLM application traces instrumented with OTel GenAI semantic conventions (gen_ai.* attributes) alongside the rest of your application stack.
Features
- Accepts OTLP traces from LLM applications instrumented with OTel GenAI instrumentation libraries
- LLM application spans appear alongside APM service traces: navigate from a slow LLM response to the upstream service that triggered it and the infrastructure it ran on
- Full MELT observability (Metrics, Events, Logs, Traces) in a single platform; LLM trace data, application logs, and infrastructure metrics are correlated in one view
- Smart sampling retains high-latency and error traces while reducing storage costs
- Self-hosted inside your VPC; no prompt data, trace data, or LLM response content leaves your infrastructure
- SOC 2 and ISO 27001 compliant
- Unlimited retention
Pricing: $0.15/GB of data ingested. No per-span, per-request, per-LLM-call, or per-host fees.
Limitations: CubeAPM does not have dedicated LLM observability features such as built-in LLM-as-a-judge evaluations, prompt management, automatic model cost calculation, or hallucination detection. It is a general observability platform. Teams that need those LLM-specific workflows should pair CubeAPM with a dedicated LLM evaluation platform like Langfuse.
Best for: Teams that want LLM application traces correlated with their full application and infrastructure stack in a single self-hosted platform, with predictable ingestion-based pricing that does not scale with LLM request or span volume.
2. Langfuse

Langfuse is the most widely adopted open-source LLM engineering platform, now part of ClickHouse following its acquisition in January 2026. It has 29.5k+ GitHub stars and continues to be developed by the same team with additional ClickHouse resources.
Features
- Hierarchical traces capturing every LLM call, tool invocation, and retrieval step with timing, inputs, outputs, and metadata
- Session tracking for multi-turn conversations and agentic workflows; user tracking for per-user cost and usage breakdowns
- Token and cost tracking: automatically infers cost from model and usage details for OpenAI, Anthropic, Google, and other supported models
- Agent graph visualization for complex agentic workflows
- Prompt management: version control, release management, one-click deployments and rollbacks, playground, composability, and caching
- Evaluations: LLM-as-a-judge evaluators, heuristic code evaluators, human annotation queues, user feedback tracking, and dataset-based experiments via SDK and UI
- Monitors and alerts: launched June 2026, providing production monitoring with configurable alert thresholds
- OpenTelemetry-native: the OTel-native Langfuse SDK v4 supports gen_ai.* attributes and known LLM instrumentors; supports Java, Go, and custom OTel via the OpenTelemetry endpoint
- Integration via Python/JS SDKs, 100+ library/framework integrations, OpenTelemetry, or HTTP API
- Self-hostable via Docker Compose, Kubernetes (Helm), AWS, GCP, or Azure Terraform
- MIT licensed; SOC2 Type II and ISO 27001 certified on paid plans; HIPAA available on Enterprise; data regions in the US, EU, and Japan
Pricing:
| Plan | Price | Included units/month | Data retention | Users |
| Hobby | Free | 50,000 | 30 days | 2 |
| Core | $29/month | 100,000 | 90 days | Unlimited |
| Pro | $199/month | 100,000 | 3 years | Unlimited |
| Enterprise | $2,499/month | 100,000 | 3 years | Unlimited |
| Additional usage (all paid plans) | $8/100,000 units | Volume discounts available |
A “unit” is one observation (a span, generation, event, or score). Self-hosted Langfuse is priced separately.
Limitations: Langfuse is focused on LLM application observability and evaluation; it does not provide general APM, infrastructure monitoring, or backend service tracing. Teams that want LLM traces correlated with broader service health need to pair it with a general observability platform.
Best for: AI engineering teams that want the most mature open-source LLM observability platform covering tracing, prompt management, evaluations, and experiments, with a genuine free tier and self-hosting option.
3. SigNoz

SigNoz is an OpenTelemetry-native, open-source observability platform with dedicated LLM observability support. Its LLM observability docs cover integrations with LangChain, LlamaIndex, LiteLLM, OpenAI, Anthropic, Gemini, Vercel AI SDK, Pydantic AI, and others using OTel GenAI semantic conventions.
Features
- Traces every LLM call and agentic step using OTel GenAI semantic conventions; prompt content captured when OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true
- Pre-built LLM dashboards for OpenAI, Gemini, LiteLLM, and others
- Token usage and cost visibility from OTel span attributes
- LLM traces correlated with full-stack APM: link a slow model response to the FastAPI route, database query, or Kubernetes pod that caused the delay
- Distributed tracing, log management, metrics, and alerting in the same platform as LLM monitoring
- Available as self-hosted Community Edition (free, MIT Expat) and managed SigNoz Cloud with a 30-day free trial
Pricing:
- Community Edition: Free; self-hosted; MIT Expat license
- SigNoz Cloud: $49/month base (includes usage up to $49); additional usage at $0.30/GB for logs and traces, $0.10 per million metric samples; no per-seat or per-LLM-span fees
Limitations: SigNoz does not have built-in LLM-as-a-judge evaluations, prompt management, or dedicated hallucination detection. LLM observability in SigNoz is delivered through the same trace and metrics pipeline as the rest of your application, which means you get correlation but not the LLM-specific workflows that Langfuse provides.
Best for: OpenTelemetry-first engineering teams that want LLM application traces correlated with full-stack APM, logs, and infrastructure in a single open-source platform, with no per-LLM-span charges.
Comparison table
| Tool | LLM trace support | Built-in evaluations | Prompt management | Cost tracking | Self-hosted | Free tier | Pricing model |
| Datadog LLM Observability | Yes (800+ models) | Yes (LLM-as-a-judge, OOTB evals) | No | Yes (auto, 800+ models) | No | Yes (40K spans/mo) | Per LLM span ($160/mo Pro) |
| CubeAPM | Yes (via OTel GenAI) | No | No | No (raw span data only) | Yes | No | $0.15/GB ingested |
| Langfuse | Yes (100+ integrations) | Yes (LLM-as-a-judge, human annotation) | Yes | Yes (inferred from model) | Yes (MIT) | Yes (50K obs/mo) | Per observation ($29/mo Core) |
| SigNoz | Yes (OTel GenAI) | No | No | Via span attributes | Yes (MIT Expat) | Yes (Community Edition) | $49/mo Cloud base |
Which LLM observability tool should you choose?
- Choose Datadog LLM Observability if your team already uses Datadog for APM and infrastructure, and you need built-in LLM-as-a-judge evaluations, automatic cost tracking across 800+ models, prompt injection detection, and sensitive data scanning in a single managed platform. The Free plan with 40,000 spans/month is a reasonable starting point for smaller deployments.
- Choose CubeAPM if you want LLM application traces visible alongside your full application and infrastructure stack in a single self-hosted platform, with predictable ingestion-based pricing that does not scale with LLM request volume, and where all prompt and response data stays inside your own VPC.
- Choose Langfuse if you want the most mature open-source LLM engineering platform with tracing, prompt versioning, evaluations, and experiment management. It is MIT licensed, self-hostable, has the widest framework coverage, and now includes monitors and alerts. Its acquisition by ClickHouse in January 2026 adds engineering resources and long-term stability.
- Choose SigNoz if you want LLM traces correlated with full-stack APM, logs, and infrastructure metrics in a single open-source, self-hosted platform using OTel GenAI standard instrumentation, with no per-LLM-span pricing.
Summary
Datadog LLM Observability is the most feature-complete managed platform for LLM application monitoring in production, covering tracing, cost estimation, evaluations, experimentation, and sensitive data scanning in one place. Its Free plan includes 40,000 LLM spans per month at no cost; the Pro plan starts at $160/month for 100,000 LLM spans with 15-day default retention. New pricing took effect May 1, 2026 — verify current rates directly with Datadog before budgeting.
For teams that want open-source or self-hosted alternatives, Langfuse is the most mature dedicated LLM observability platform with a genuine free tier, MIT license, and the broadest framework support. SigNoz provides LLM trace correlation alongside full-stack APM using OTel standards without per-span pricing. CubeAPM provides the most cost-predictable full-stack option for teams that want LLM traces alongside application and infrastructure data in a self-hosted platform at $0.15/GB with no per-span fees.
| Tool | Best for | Free tier | Self-hosted |
| Datadog LLM Observability | Managed enterprise LLM monitoring, built-in evals and cost tracking | Yes (40K spans/mo) | No |
| CubeAPM | Full-stack LLM + APM correlation, self-hosted, ingestion-based pricing | No | Yes |
| Langfuse | Open-source LLM engineering platform, prompt management, evaluations | Yes (50K obs/mo) | Yes (MIT) |
| SigNoz | OTel-native full-stack + LLM observability, no per-span pricing | Yes (Community Edition) | Yes (MIT Expat) |
Disclaimer: Datadog LLM Observability pricing is confirmed from Datadog’s LLM Observability product page and Datadog’s LLM cost documentation as of June 2026. New pricing effective May 1, 2026; verify current rates at Datadog’s pricing page. The pricing for Langfuse, SigNoz, and CubeAPM is also taken from their pricing pages. Always verify current pricing and features directly with each vendor before making decisions.
Also read:
What Are the Best Frontend Performance Monitoring Tools for Angular in 2026?
What Are the Best Frontend Performance Monitoring Tools for Next.js in 2026?
What Are the Best Frontend Performance Monitoring Tools for Vue.js in 2026?





