CubeAPM
CubeAPM CubeAPM

Linkerd Monitoring: How to Track Service Mesh Health in 2026

Linkerd Monitoring: How to Track Service Mesh Health in 2026

Table of Contents

Linkerd is a lightweight service mesh that adds reliability, security, and observability to Kubernetes applications without code changes. But once Linkerd is deployed, how do you actually monitor it? According to the CNCF’s 2024 Annual Survey, 56% of organizations now run service mesh in production, making service mesh observability a critical capability for teams at scale.

This guide covers what Linkerd monitoring is, how Linkerd’s telemetry pipeline works, what metrics you should track, and how to set up full-stack observability that connects service mesh data with application traces, logs, and infrastructure health. We’ll also compare native Linkerd tooling with third-party platforms and show what production teams actually monitor to catch issues before they reach users.

What Is Linkerd Monitoring?

Linkerd monitoring is the practice of tracking the runtime behavior of a Linkerd service mesh including proxy health, traffic success rates, latencies, request volumes, and control plane stability to detect performance degradation, misconfigurations, or security violations before they impact end users.

Unlike traditional application monitoring that observes what happens inside your code, Linkerd monitoring observes what happens between your services. Every HTTP, HTTP/2, gRPC, and TCP connection that flows through Linkerd’s data plane proxy generates telemetry automatically. You get visibility into service to service communication patterns without instrumenting a single line of application code.

Linkerd provides this telemetry through its built-in Prometheus instance, which scrapes metrics from every sidecar proxy deployed alongside your pods. These metrics cover request success rates, latencies, traffic volumes, and connection states at the service, route, and backend level. For most teams, this is the starting point for monitoring a service mesh in production.

How Linkerd Monitoring Works

Linkerd’s monitoring system is built into its architecture and works in three layers: the data plane, the control plane, and the observability extension called Viz.

When you deploy Linkerd, every pod that gets meshed receives a lightweight proxy sidecar. This proxy intercepts all inbound and outbound traffic for that pod. As traffic flows through the proxy, it emits real time metrics about latency, success rate, request count, and connection state. These metrics are exposed in Prometheus format at each proxy’s metrics endpoint.

The Linkerd control plane runs services like the destination service, identity service, and proxy injector in the linkerd namespace. These components also expose metrics about their own health and operation.

To access Linkerd’s full monitoring stack, you install the Viz extension using linkerd viz install | kubectl apply -f -. This extension deploys four components into the linkerd-viz namespace: a Prometheus instance to scrape and store metrics, a metrics-api service to query those metrics, a tap service for live traffic inspection, and a web dashboard for visualization.

Prometheus scrapes metrics from all proxies and control plane components every few seconds. The metrics-api service queries Prometheus and exposes the data through Linkerd’s CLI and dashboard. The tap service streams live request data for real time debugging. The web component renders this data in the Linkerd dashboard UI.

This stack works out of the box and requires no configuration. Golden metrics are available immediately after installation. But the built-in Prometheus instance stores only 6 hours of metrics and does not persist data across pod restarts. For production use, teams typically export Linkerd metrics to a long term observability platform.

What Linkerd Monitoring Measures

Linkerd tracks three categories of telemetry: golden metrics that measure the health of meshed traffic, TCP level metrics for non-HTTP protocols, and control plane health metrics that track the stability of Linkerd itself.

Golden Metrics

Golden metrics are the core observability signals for any service. Linkerd generates these automatically for all HTTP, HTTP/2, and gRPC traffic.

Success rate is the percentage of requests that returned a non-5xx response code over a rolling time window, typically one minute. Linkerd calculates success rate both before retries at the backend level and after retries at the route level. This distinction matters because a route configured with retries can mask backend instability. If your route shows 99.9% success but the backend shows 95% success, you know retries are absorbing failures that could indicate a deeper issue.

Request rate measures how much traffic each service receives, broken down by route and backend. This helps identify traffic patterns, detect unexpected spikes, and correlate load changes with performance degradation. If a service suddenly starts receiving 10x more requests than normal, you see it immediately in the request rate metric.

Latency percentiles show how long requests take to complete. Linkerd tracks the 50th, 95th, and 99th percentiles. The 50th percentile represents typical performance. The 95th and 99th percentiles surface tail latency, which often correlates with customer complaints even if average latency looks fine. A service with a 50ms p50 latency but a 2 second p99 latency is delivering a bad experience to 1% of users.

TCP Level Metrics

For non-HTTP traffic like database connections, message queues, or custom TCP protocols, Linkerd tracks bytes sent and received, open connections, and connection state changes. These metrics help you monitor services that don’t use HTTP but still flow through the mesh.

Control Plane Health

Linkerd’s control plane components also emit metrics. You can monitor the destination service’s routing decisions, the identity service’s certificate issuance rate, and the proxy injector’s webhook success rate. If any control plane component starts failing, your entire mesh can degrade. Monitoring these components is critical for production stability.

Viewing Linkerd Metrics

Linkerd provides three ways to access monitoring data: the CLI, the dashboard, and direct Prometheus queries.

Using the Linkerd CLI

The simplest way to view metrics is through the Linkerd CLI. After installing the Viz extension, you can run commands like linkerd viz stat deploy to see success rate, request rate, and latency for all deployments in the current namespace. The output looks like this:

NAME           MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
web              1/1   100.00%   2.5rps           5ms          12ms          18ms
emoji            1/1    98.50%   8.2rps          12ms          45ms         120ms
voting           1/1   100.00%   1.1rps           3ms           8ms          15ms

This gives an immediate snapshot of how each service is performing. If a service shows a dropping success rate or spiking latency, you see it in seconds.

You can also use linkerd viz top deploy to see which routes are receiving the most traffic. This helps identify hot paths in your application. And linkerd viz tap deploy/web streams live requests in real time, showing the source, destination, status code, and latency of each request as it happens.

Using the Linkerd Dashboard

The web dashboard provides a visual interface for the same data. Run linkerd viz dashboard to open it in your browser. The dashboard shows service graphs, deployment health, route level metrics, and live tap streams. It’s useful for quick investigations but less practical for long term monitoring or alerting.

Querying Prometheus Directly

For teams that want to build custom dashboards or alerts, you can query the built-in Prometheus instance directly. Linkerd’s Prometheus exposes metrics at http://prometheus.linkerd-viz.svc.cluster.local:9090. Key metrics include:

  • request_total — total request count by route and status code
  • response_latency_ms_bucket — histogram of response latencies
  • response_total — total responses by status code
  • tcp_open_connections — current open TCP connections
  • tcp_write_bytes_total — bytes written over TCP

These metrics are tagged with labels like deployment, namespace, dst_deployment, dst_namespace, route, and status_code, enabling high-cardinality queries.

Monitoring Linkerd in Production

The built-in Viz extension works well for development and short term investigations. But in production, its 6 hour retention window and lack of persistence make it insufficient. Most teams export Linkerd metrics to a long term observability platform.

Exporting Metrics to External Systems

Linkerd’s Prometheus instance can be scraped by an external Prometheus server using federation. You configure your external Prometheus to scrape the /federate endpoint of Linkerd’s Prometheus. This pulls all Linkerd metrics into your centralized monitoring system.

Alternatively, you can configure Linkerd to send metrics directly to a remote write endpoint. This is common for teams using managed Prometheus services like Grafana Cloud, AWS Managed Service for Prometheus, or platforms that support remote write like CubeAPM.

Setting Up Alerts

Production Linkerd monitoring requires alerting on key failure modes. Common alerts include:

  • Success rate drops below 99% for any service
  • P99 latency exceeds a defined threshold
  • Request rate spikes beyond expected traffic patterns
  • Control plane component becomes unhealthy
  • Proxy sidecar fails to start or crashes repeatedly

You configure these alerts in your observability platform’s alert manager. Each alert should include context about which service, route, or deployment triggered it so on-call engineers can investigate without guessing.

Correlating Metrics with Logs and Traces

Metrics tell you that something is wrong. Logs and traces tell you why. The most effective Linkerd monitoring setups correlate service mesh metrics with application traces and logs. When a service’s success rate drops, you want to immediately see the error logs from that service and the distributed traces for failed requests.

Infrastructure monitoring tools that support OpenTelemetry can ingest Linkerd metrics alongside APM traces and logs, giving you a unified view of what’s happening across your entire stack. This eliminates the need to context switch between multiple dashboards during an incident.

Linkerd Service Profiles and Per-Route Metrics

By default, Linkerd aggregates metrics at the service level. But in production, you often need route level visibility. A service might handle 50 different API endpoints, and a single slow route can cause customer complaints even if the service level metrics look fine.

Linkerd Service Profiles enable per-route metrics. A Service Profile is a Kubernetes custom resource that defines the routes for a service. Once configured, Linkerd tracks success rate, latency, and request rate for each route individually.

For example, if your API service has routes like /api/users, /api/orders, and /api/payments, you can see which specific route is slow or failing. This dramatically reduces time to root cause during incidents.

Service Profiles also enable retries, timeouts, and traffic splitting at the route level. Combined with per-route metrics, this gives you fine-grained control over how traffic behaves and how it’s monitored.

Best Practices for Linkerd Monitoring

Start with Golden Metrics

Focus first on success rate, request rate, and latency percentiles for every service. These three metrics catch most production issues. If a service’s success rate drops or latency spikes, you know something is wrong even if you don’t yet know the cause.

Monitor the Control Plane

Linkerd’s data plane proxies are stateless and self-healing. But the control plane is critical infrastructure. If the identity service stops issuing certificates, new pods can’t join the mesh. If the destination service fails, proxies lose routing information. Always monitor control plane health with alerts.

Use Service Profiles for Critical Paths

Not every service needs per-route metrics. But your most critical user-facing services do. Define Service Profiles for high-value paths like checkout flows, payment processing, and authentication. Route level metrics make debugging these paths significantly faster.

Export Metrics to Long-Term Storage

The built-in Prometheus instance is sufficient for quick checks and development. For production, export metrics to a platform with unlimited retention. You’ll need historical data to establish baselines, detect trends, and investigate incidents that happened hours or days ago.

Correlate Service Mesh Metrics with Application Data

Linkerd tells you what’s happening between services. Application tracing tells you what’s happening inside services. The combination is far more powerful than either alone. Use an observability platform that can correlate service mesh metrics with APM traces and logs in a single view.

Tools for Linkerd Monitoring

Linkerd’s built-in stack provides everything you need for basic monitoring. But for production scale environments, most teams integrate with third-party platforms to gain long-term retention, advanced alerting, and unified observability.

Grafana

Linkerd publishes official Grafana dashboards that visualize golden metrics, TCP metrics, and control plane health. These dashboards work with any Prometheus data source. Teams already using Grafana often import these dashboards and customize them for their environment.

Grafana is free and open source, but requires you to run your own Prometheus, configure federation, manage storage, and maintain dashboards. For teams with existing Grafana expertise, this works well. For teams without that expertise, it adds operational overhead.

Datadog

Datadog’s Linkerd integration collects metrics from Linkerd’s Prometheus endpoints and surfaces them in Datadog’s APM and infrastructure views. Datadog automatically tags metrics with Kubernetes labels, making it easy to correlate service mesh data with pod logs, node health, and application traces.

Datadog’s pricing is host-based and adds up quickly. A 50-node Kubernetes cluster running Linkerd costs $900 per month for infrastructure monitoring alone before logs, APM, or synthetics. For large environments, this can reach $5,000 to $10,000 per month.

CubeAPM

CubeAPM provides full-stack observability for Linkerd environments with native support for Prometheus metrics, OpenTelemetry traces, and logs in a single platform. It runs on-prem or in your VPC, so all telemetry stays within your infrastructure. This eliminates data egress costs and ensures compliance with data residency requirements.

CubeAPM’s pricing is $0.15 per GB ingested with unlimited retention and no per-host or per-user fees. For a 50-node Kubernetes cluster generating 10 TB of telemetry per month, CubeAPM costs $1,500 per month compared to $4,800 per month with Datadog. The platform includes pre-built dashboards for Linkerd, Kubernetes, and application-level metrics, with correlation between service mesh traffic and distributed traces built in.

CubeAPM is compatible with Prometheus exporters, OpenTelemetry collectors, and Linkerd’s metrics API, so you can ingest Linkerd data without changing your existing instrumentation.

New Relic

New Relic can ingest Linkerd metrics through its Prometheus integration. You configure the New Relic Prometheus agent to scrape Linkerd’s metrics endpoints, and the data flows into New Relic’s observability platform. New Relic’s UI provides service maps, golden metric dashboards, and alerting.

New Relic’s pricing is based on data ingestion and user seats. Data ingestion costs $0.30 to $0.50 per GB depending on your plan. User seats cost $99 to $549 per user per month depending on role and access level. For teams already on New Relic, adding Linkerd metrics is straightforward. For teams evaluating platforms, New Relic’s cost structure makes it expensive at scale.

Dynatrace

Dynatrace offers automated service mesh monitoring with its Kubernetes integration. It automatically discovers Linkerd proxies, collects golden metrics, and applies AI-based anomaly detection to surface issues before they escalate. Dynatrace’s strength is its automation, but its pricing is among the highest in the market. Host-based pricing starts around $73 per host per month, making it suitable primarily for large enterprises with significant budgets.

Disclaimer: Pricing based on publicly available information as of June 2026. Enterprise discounts, custom contracts, and negotiated rates are not reflected here.

How to Set Up Linkerd Monitoring

Setting up basic Linkerd monitoring takes less than 10 minutes. Here’s the step-by-step process.

Step 1: Install Linkerd

If you haven’t already, install Linkerd on your Kubernetes cluster using the official CLI:

curl -sL https://run.linkerd.io/install | sh
linkerd install | kubectl apply -f -
linkerd check

This installs the Linkerd control plane and validates that everything is running correctly.

Step 2: Inject Linkerd Proxies

Mesh the namespaces or deployments you want to monitor by adding the linkerd.io/inject: enabled annotation:

kubectl annotate namespace your-namespace linkerd.io/inject=enabled
kubectl rollout restart deployment -n your-namespace

Every pod in the annotated namespace now runs with a Linkerd sidecar proxy. Traffic flowing through these pods is automatically instrumented.

Step 3: Install the Viz Extension

Install the Viz extension to access the dashboard, CLI tools, and built-in Prometheus:

linkerd viz install | kubectl apply -f -
linkerd viz check

This deploys the metrics stack to the linkerd-viz namespace.

Step 4: View Metrics

Open the dashboard with linkerd viz dashboard or query metrics using the CLI:

linkerd viz stat deploy -n your-namespace
linkerd viz top deploy -n your-namespace
linkerd viz tap deploy/your-deployment -n your-namespace

You now have full visibility into golden metrics for all meshed services.

Step 5: Export to Long-Term Storage

To send metrics to an external platform, configure Prometheus federation or use an OpenTelemetry Collector to scrape Linkerd’s metrics endpoints and forward them to your observability backend. Most platforms provide documentation for this setup.

Migrating from Other Service Meshes to Linkerd

Teams moving from Istio or other service meshes to Linkerd often ask how monitoring changes. Linkerd’s monitoring model is simpler than Istio’s. There’s no EnvoyFilter complexity, no multi-component telemetry pipeline, and no need to choose between Mixer v1 and v2. Linkerd generates golden metrics by default with zero configuration.

If you’re already using Prometheus and Grafana, your existing dashboards can be adapted to query Linkerd metrics. The metric names are different from Istio’s Envoy metrics, but the concepts are the same. Linkerd’s documentation includes a migration guide that maps Istio concepts to Linkerd equivalents.

For teams using managed observability platforms like Datadog or New Relic, you simply point the integration at Linkerd’s Prometheus endpoint instead of Istio’s. The integration handles the rest.

Conclusion

Linkerd monitoring provides automatic visibility into service to service communication without requiring code changes or manual instrumentation. The built-in Viz extension gives you golden metrics, CLI tools, and a dashboard in minutes. For production use, exporting metrics to a long-term platform and setting up alerts on success rate, latency, and control plane health ensures you catch issues before they reach users.

The key is to start simple: install Viz, monitor golden metrics, and add per-route visibility with Service Profiles as needed. Then integrate with your broader observability stack to correlate service mesh metrics with application traces, logs, and infrastructure health.

Disclaimer: The information in this article reflects the latest details available at the time of publication and may change as technologies and products evolve. Features, pricing, and plan limits can change over time. Always verify the latest information directly with the vendor before making purchasing or deployment decisions.

Frequently Asked Questions

What is the difference between Linkerd and Linkerd2?

Linkerd2 is the current version of Linkerd, rewritten in Rust and Go for Kubernetes-native environments. The original Linkerd (now called Linkerd1) was designed for non-Kubernetes systems and is no longer actively developed. When people refer to Linkerd today, they mean Linkerd2.

Does Linkerd monitoring require code changes?

No. Linkerd generates telemetry automatically by intercepting traffic at the proxy layer. You don’t need to instrument your application code or add libraries. Just inject the Linkerd sidecar and metrics start flowing immediately.

How long does Linkerd store metrics?

The built-in Prometheus instance stores metrics for 6 hours. For production use, export metrics to a long-term observability platform that supports unlimited retention.

Can I use Linkerd without the Viz extension?

Yes. The Viz extension provides the dashboard, CLI tools, and built-in Prometheus. But you can run Linkerd without it and export metrics directly to an external monitoring system using federation or an OpenTelemetry Collector.

What is the best tool for monitoring Linkerd in production?

The best tool depends on your environment. For teams that want self-hosted observability with predictable pricing, CubeAPM provides full-stack visibility at $0.15 per GB. For teams already using Grafana, the official Linkerd dashboards work well. For enterprises with large budgets, Datadog and Dynatrace offer managed solutions with broad integrations.

How do I monitor Linkerd’s control plane health?

Query the control plane metrics exposed by Linkerd’s Prometheus instance. Key metrics include `control_plane_alive` for component health, `destination_service_requests_total` for routing decisions, and `identity_cert_expiration_timestamp_seconds` to track certificate renewal. Set up alerts on these metrics to detect control plane failures early.

Does Linkerd support OpenTelemetry?

Linkerd exposes metrics in Prometheus format, which OpenTelemetry Collectors can scrape and export to any OpenTelemetry-compatible backend. Linkerd does not natively emit traces or logs through OpenTelemetry, but you can combine Linkerd metrics with application-level OpenTelemetry instrumentation for full-stack observability.

×
×