Kubernetes Liveness Probe Failed Error: Container Restarts, Health Check Failures & OpenTelemetry Observability

Author: Vijay Aggarwal
Category: Kubernetes Errors
Published Date: November 1, 2025
Last updated: January 21st, 2026

A Kubernetes Liveness Probe Failed error occurs when a container stops responding to health checks, prompting Kubernetes to restart it. This can lead to repeated restarts, degraded service availability, and deployment delays. In fact 93% of organizations use Kubernetes in production or are actively evaluating it, meaning even minor probe misconfigurations can cascade into major outages in real-world environments.

CubeAPM helps teams detect, correlate, and resolve liveness probe failures faster by unifying Events, Metrics, Logs, and Rollouts under a single observability layer. Its OpenTelemetry-native ingestion automatically links failed probes with container restarts and node health metrics — helping you trace issues back to the root cause in seconds.

In this guide, we’ll explain what the Kubernetes Liveness Probe Failed error means, why it happens, how to fix it, and how to monitor it effectively using CubeAPM.

What is Kubernetes Liveness Probe Failed Error

A Kubernetes Liveness Probe is a health check that determines if a container is still running as expected. When it fails, Kubernetes assumes the container is unhealthy and automatically restarts it to restore service availability. This behavior is managed by the kubelet, which periodically checks the probe endpoint — typically through an HTTP request, TCP socket, or command execution.

If the container doesn’t respond within the configured timeoutSeconds or fails repeatedly beyond the failureThreshold, Kubernetes marks the probe as failed. When this happens, the Pod transitions to a CrashLoopBackOff or Restarting state, depending on the underlying cause.

Common misconfigurations — such as incorrect probe paths, ports, or delays — can cause the liveness probe to fail even when the application is healthy, leading to unnecessary restarts that increase downtime, CPU usage, and cluster noise.

In short, the “Liveness Probe Failed” error means Kubernetes is trying to heal the application automatically but may be restarting containers too aggressively due to configuration or readiness timing issues.

Why Kubernetes Liveness Probe Failed Error Happens

A failed liveness probe usually points to deeper issues in container health, startup behavior, or configuration. Here are the most frequent and impactful causes you’ll encounter in real-world clusters.

1. Incorrect Probe Path or Port

If the httpGet probe path or port doesn’t match the actual application endpoint, the kubelet receives 404 or connection refused responses. Even a minor mismatch — like using /health instead of /healthz — can trigger false failures. This is one of the most common reasons for unnecessary container restarts and degraded service availability.

Example:

Your app listens on /healthz but the probe targets /health.

This mismatch triggers probe failures, leading to unnecessary restarts.

Quick check:

Bash

kubectl describe pod <pod-name> | grep -A5 Liveness

If you see repeated “connection refused” or “HTTP 404” events, your probe path or port configuration likely doesn’t match the running application.

2. Insufficient Initial Delay

When probes begin checking before the application has fully initialized, they fail continuously during startup. This typically affects Java, Spring Boot, and database-heavy applications that require several seconds of boot time. The kubelet interprets these early probe failures as real crashes and restarts the Pod repeatedly.

Example:

A Spring Boot service with heavy dependency loading may take 20 seconds to boot, but the probe starts checking after 5 seconds.

Quick check:

Bash

kubectl get pod <pod-name> -o yaml | grep initialDelaySeconds

If you see a small delay value (e.g., 5 seconds) for a slow-starting app, increase it to allow proper initialization before the first check.

3. Application-Level Timeouts

Applications performing CPU-heavy or I/O-bound operations during probe execution may exceed the timeoutSeconds value. In this case, the container might still be healthy, but Kubernetes will restart it because the response took too long. Over time, this creates unnecessary restarts and elevated latency.

Example:

A probe hitting a database query endpoint takes longer than the configured 1-second timeout.

Quick check:

Bash

kubectl describe pod <pod-name> | grep -A3 "Liveness probe"

If you see “probe failed due to timeout” messages, increase timeoutSeconds or optimize the health check endpoint for faster response.

4. Network or DNS Failures

Intermittent network drops, DNS latency, or misconfigured NetworkPolicies can cause otherwise healthy probes to fail. This is especially common when liveness checks reference service endpoints across namespaces or rely on DNS resolution under high load.

Example:

CoreDNS pods under CPU pressure cause slow lookups, making liveness probes fail intermittently.

Quick check:

Bash

kubectl get events --sort-by=.metadata.creationTimestamp | grep Liveness

If you see sporadic failures with “network unreachable” or “DNS lookup timeout” messages, verify CoreDNS health and review any restrictive network policies.

5. Resource Starvation (CPU or Memory Throttling)

When containers operate under tight resource limits, the kubelet may fail to process probe requests within the expected window. CPU throttling and memory pressure slow down application response times, leading Kubernetes to assume the container is unresponsive.

Example:
A container running close to its CPU limit delays probe responses, triggering false restarts.

Quick check:

Bash

kubectl top pod <pod-name>

If you see high CPU or memory usage close to the defined limits, raise resource requests or limits to stabilize probe responses.

6. Misconfigured Command or Exec Probes

Exec-based probes fail when their commands return a non-zero exit code — often due to a missing binary, incorrect script path, or misused shell syntax. These issues can make healthy containers appear unhealthy, triggering constant restarts.

Example:

A probe uses cat /tmp/health but the file doesn’t exist in the container.

Quick check:

Bash

kubectl logs <pod> -c <container> --previous

If you see errors like “exec: not found” or “permission denied,” review the probe’s command definition and confirm that required binaries exist in the container image.

How to Fix Kubernetes Liveness Probe Failed Error

Fixing this issue requires validating each possible failure point — from probe configurations to resource limits and startup timings. Below are the most reliable fixes you can apply to prevent repetitive container restarts.

1. Fix Incorrect Probe Path or Port

If your probe endpoint or port doesn’t align with your running container, Kubernetes will keep failing the health check.

Validate the correct endpoint by checking the container logs or testing the endpoint locally.

Check:

Bash

kubectl describe pod <pod-name> | grep -A5 Liveness

If you see “HTTP 404” or “connection refused,” update the probe’s path or port in the deployment YAML to match your app’s actual health endpoint.

Fix:

Bash

kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","livenessProbe":{"httpGet":{"path":"/healthz","port":8080}}}]}}}}'

2. Increase Initial Delay and Period Seconds

When an application starts slowly, the probe may fail before the container is ready.
Increase the probe’s initialDelaySeconds and periodSeconds so Kubernetes gives the container enough time to initialize before starting health checks.

Check:

Bash

kubectl get pod <pod-name> -o yaml | grep initialDelaySeconds

If you see a small value (like 5), raise it to at least 15–20 seconds for slow-starting apps such as Spring Boot or Node.js.

Fix:

Bash

kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","livenessProbe":{"initialDelaySeconds":20,"periodSeconds":10}}]}}}}'

3. Adjust Timeout and Failure Threshold

If your endpoint is healthy but responds slowly, Kubernetes may restart it unnecessarily.
Extend the timeoutSeconds and failureThreshold values so short response spikes don’t trigger restarts.

Check:

Bash

kubectl get pod <pod-name> -o yaml | grep timeoutSeconds

If the timeout is too low (e.g., 1 second), increase it to 5–10 seconds based on response time trends.

Fix:

Bash

kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","livenessProbe":{"timeoutSeconds":10,"failureThreshold":5}}]}}}}'

4. Validate Command or Exec Probes

If you’re using exec probes, verify the commands actually exist inside the image and execute properly.

A missing binary or incorrect file path will immediately fail the probe.

Check:

Bash

kubectl logs <pod> -c <container> --previous

If you see “exec: not found” or “permission denied,” fix the command or args in your Pod spec or Dockerfile.

Fix:

Bash

kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","livenessProbe":{"exec":{"command":["/bin/sh","-c","test -f /tmp/health || exit 1"]}}}]}}}}'

5. Resolve Network and DNS Failures

Unstable network paths or slow DNS resolution can cause false probe failures.

Ensure your DNS pods (CoreDNS) are healthy, and network policies aren’t blocking inter-service communication.

Check:

Bash

kubectl get pods -n kube-system -l k8s-app=kube-dns

If any CoreDNS pods are CrashLoopBackOff or Pending, restart them and check cluster-level network rules.

Fix:

Bash

kubectl rollout restart deployment coredns -n kube-system

6. Adjust Resource Limits and Requests

Containers starved of CPU or memory may delay responses and fail probes.

Audit resource settings and scale up if you notice throttling or memory pressure.

Check:

Bash

kubectl top pod <pod-name>

If usage is close to defined limits, increase resources.requests and resources.limits to stabilize probe behavior.

Fix:

Bash

kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","resources":{"requests":{"cpu":"500m","memory":"512Mi"},"limits":{"cpu":"1","memory":"1Gi"}}}]}}}}'

7. Add Graceful Startup Logic

If your app needs to finish setup tasks before serving health checks, integrate a startup delay script or readiness gating logic.

This ensures liveness checks don’t begin too early and avoids restarts during initialization.

Check:

Bash

kubectl logs <pod-name>

If you see probe failures immediately after deployment, add startup hooks or lifecycle delays so probes wait until the app is truly ready.

Fix:

Bash

kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","lifecycle":{"postStart":{"exec":{"command":["/bin/sh","-c","sleep 10"]}}}}]}}}}'

Monitoring Kubernetes Liveness Probe Failed Error with CubeAPM

Fastest path to root cause: Correlate Events, Metrics, Logs, and Rollouts in a single view. CubeAPM automatically links Liveness probe failed events to container restarts, latency spikes, and recent deployments — giving you the full picture of why the probe failed and what changed before it happened.

Step 1 — Install CubeAPM (Helm)

Install the CubeAPM chart using Helm to enable full observability across your cluster.

Bash

helm repo add cubeapm https://charts.cubeapm.com && helm repo update cubeapm && helm show values cubeapm/cubeapm > values.yaml && helm install cubeapm cubeapm/cubeapm -f values.yaml

Upgrade when needed:

Bash

helm upgrade cubeapm cubeapm/cubeapm -f values.yaml

Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)

Run both modes for full coverage:

DaemonSet → Collects node-level metrics, container logs, and kubelet signals.
Deployment → Handles centralized event ingestion and aggregation.

Install both with Helm:

Bash

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts && helm repo update open-telemetry && helm install otel-collector-daemonset open-telemetry/opentelemetry-collector -f otel-collector-daemonset.yaml && helm install otel-collector-deployment open-telemetry/opentelemetry-collector -f otel-collector-deployment.yaml

Upgrade:

Bash

helm upgrade otel-collector-daemonset open-telemetry/opentelemetry-collector -f otel-collector-daemonset.yaml && helm upgrade otel-collector-deployment open-telemetry/opentelemetry-collector -f otel-collector-deployment.yaml

Step 3 — Collector Configs Focused on Liveness Probe Failures

A) DaemonSet (per-node: metrics, logs, kubelet data)

Use this lightweight configuration to capture node metrics and container logs related to probe failures.

YAML

mode: daemonset

image:

  repository: otel/opentelemetry-collector-contrib

presets:

  kubernetesAttributes:

    enabled: true

  hostMetrics:

    enabled: true

  kubeletMetrics:

    enabled: true

  logsCollection:

    enabled: true

    storeCheckpoints: true

config:

  processors:

    batch: {}

  exporters:

    otlphttp/metrics:

      metrics_endpoint: http://<cubeapm_endpoint>:3130/api/metrics/v1/save/otlp

    otlphttp/logs:

      logs_endpoint: http://<cubeapm_endpoint>:3130/api/logs/insert/opentelemetry/v1/logs

      headers:

        Cube-Stream-Fields: k8s.namespace.name,k8s.deployment.name,k8s.statefulset.name

kubeletMetrics: Tracks probe execution counts and failure rates.
logsCollection: Captures “Liveness probe failed” messages from container logs.
otlphttp exporters: Send metrics and logs directly to CubeAPM for correlation.

B) Deployment (cluster-wide events + rollouts)

This configuration collects Liveness probe failed, Killing container, and BackOff

YAML

mode: deployment

image:

  repository: otel/opentelemetry-collector-contrib

config:

  receivers:

    k8s_events: {}

  processors:

    batch: {}

  exporters:

    otlphttp/logs:

      logs_endpoint: http://<cubeapm_endpoint>:3130/api/logs/insert/opentelemetry/v1/logs

k8s_events: Captures probe failure events and restart reasons from the API server.
batch processor: Buffers and batches data efficiently.
otlphttp/logs: Forwards structured events to CubeAPM’s central pipeline.

Step 4 — Supporting Components (Optional)

To enrich Kubernetes metrics, deploy kube-state-metrics for object-level insights.

Bash

helm install kube-state-metrics prometheus-community/kube-state-metrics

This adds metadata like container status, restart counts, and pod phase transitions to the CubeAPM dashboards.

Step 5 — Verification: What You Should See in CubeAPM

Once setup completes, verify that telemetry is flowing correctly. In CubeAPM, you should see:

Events: “Liveness probe failed” and “Killing container” events appearing under the Events view.
Metrics: Sudden increases in kube_pod_container_status_restarts_total.
Logs: Error lines showing Liveness probe failed: with timestamp correlation.
Restarts: Container restart spikes linked with deployment rollouts.
Rollouts: Recent image updates or config changes preceding probe failures.

These correlated views let you pinpoint whether the failure stems from misconfiguration, resource starvation, or application unresponsiveness — all within a single timeline.

Example Alert Rules for Kubernetes Liveness Probe Failed Error

Alerting on liveness probe failures ensures you catch unhealthy containers before they trigger cascading restarts or downtime. Below are practical PromQL-based alert examples you can use directly in your Kubernetes monitoring setup.

1. Liveness Probe Failures Spiking (Direct signal from kubelet)

This alert fires when Kubernetes reports a sustained increase in failed liveness probes, indicating potential misconfiguration or unresponsive containers.

YAML

groups:

- name: liveness-probe-alerts

  rules:

  - alert: LivenessProbeFailuresHigh

    expr: rate(prober_probe_total{probe_type="liveness",result="failure"}[5m]) > 0.5

    for: 5m

    labels:

      severity: critical

    annotations:

      summary: "High liveness probe failure rate"

      description: "Kubelet reports a sustained liveness probe failure rate (>0.5/min) for 5m. Check probe path, port, and resource usage."

2. Container Restarts Spike (Effect of failing liveness checks)

Use this alert to detect frequent restarts caused by consecutive probe failures. It helps identify unstable pods before they degrade service reliability.

YAML

groups:

- name: container-restart-alerts

  rules:

  - alert: ContainerRestartsSpike

    expr: increase(kube_pod_container_status_restarts_total[5m]) > 3

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Container restarts spiking"

      description: "Container restarted more than 3 times in 5m. Likely caused by failing liveness probes. Correlate with Events and Rollouts in CubeAPM."

3. Persistent Probe Failures per Deployment

This alert focuses on sustained liveness probe issues within specific workloads or namespaces, helping teams locate misbehaving deployments faster.

YAML

groups:

- name: workload-liveness-alerts

  rules:

  - alert: WorkloadLivenessProbeUnhealthy

    expr: sum by (namespace, pod) (rate(prober_probe_total{probe_type="liveness",result="failure"}[10m])) > 0.2

    for: 10m

    labels:

      severity: critical

    annotations:

      summary: "Workload liveness probe unhealthy"

      description: "Continuous liveness probe failures for 10m in a specific workload. Check probe thresholds, initial delays, and readiness alignment."

Conclusion

Liveness probe failures usually trace back to misaligned health endpoints, aggressive timeouts, slow startups, or resource pressure. Left unchecked, they trigger restart loops, delay rollouts, and erode service SLOs. The fixes are straightforward—validate probe paths/ports, tune initialDelaySeconds, timeoutSeconds, and failureThreshold, stabilize CPU/memory, and remove brittle exec checks.

CubeAPM reduces mean time to root cause by correlating Events, Metrics, Logs, and Rollouts in one timeline. You see probe failure events alongside restart spikes, latency on the health endpoint, and the exact deployment that introduced the change—so you can fix configuration drift before it becomes an outage.

Adopt the alert rules above, instrument with the OTEL Collector (DaemonSet + Deployment), and verify dashboards for restarts, probe errors, and rollout context. When liveness probes fail, CubeAPM turns noisy signals into clear next steps. Ready to harden your cluster? Let’s ship this playbook across your environments.

FAQs

1. What does a “Liveness Probe Failed” error mean in Kubernetes?

It means the container didn’t respond successfully to its configured liveness probe endpoint. When this happens repeatedly, Kubernetes assumes the container is stuck or unhealthy and restarts it automatically to restore service availability.

2. How is a Liveness Probe different from a Readiness Probe?

A Liveness Probe checks if the container is still alive and functioning, while a Readiness Probe determines whether it’s ready to serve traffic. A failing readiness probe removes a pod from service endpoints, but a failing liveness probe restarts the container. Both can be monitored and correlated inside CubeAPM.

3. What are the most common causes of Liveness Probe failures?

Frequent reasons include incorrect probe paths or ports, too-short initial delays, slow startup applications, and CPU or memory throttling. Network latency, CoreDNS issues, and invalid exec commands also commonly cause probe failures.

4. How can I monitor Liveness Probe failures in real time?

You can track probe failures using CubeAPM’s Kubernetes dashboards. It automatically visualizes events like “Liveness probe failed” alongside container restarts, node metrics, and deployment rollouts — giving you instant context to locate the root cause.

5. How can I prevent false Liveness Probe failures?

Set reasonable probe intervals (initialDelaySeconds, periodSeconds, timeoutSeconds) and match probe paths precisely to the application’s health endpoints. Also, ensure the container has sufficient CPU and memory. CubeAPM helps validate these configurations by correlating failed probes with cluster resource metrics and recent configuration changes.

Last9 vs Datadog: In-Depth Comparison 2026

Indu Priya July 3, 2026

Monitoring a Fastify Application: Datadog Setup, Overhead, and Alternatives

Indu Priya July 3, 2026

Vertex AI Endpoint Latency and Cost Monitoring: Complete Guide

Abhinav Garg July 3, 2026

Monitoring DragonflyDB in Production: Setup & Best Practices

Indu Priya July 3, 2026

pgvector Query Performance Monitoring: How to Track Index Health, Query Latency, and Embedding Search Performance

Abhinav Garg July 3, 2026

SigNoz vs Azure Monitor: In-Depth Comparison 2026

Indu Priya July 3, 2026

Kubernetes Liveness Probe Failed Error: Container Restarts, Health Check Failures & OpenTelemetry Observability

Table of Contents

What is Kubernetes Liveness Probe Failed Error

Why Kubernetes Liveness Probe Failed Error Happens

1. Incorrect Probe Path or Port

2. Insufficient Initial Delay

3. Application-Level Timeouts

4. Network or DNS Failures

5. Resource Starvation (CPU or Memory Throttling)

6. Misconfigured Command or Exec Probes

How to Fix Kubernetes Liveness Probe Failed Error

1. Fix Incorrect Probe Path or Port

2. Increase Initial Delay and Period Seconds

3. Adjust Timeout and Failure Threshold

4. Validate Command or Exec Probes

5. Resolve Network and DNS Failures

6. Adjust Resource Limits and Requests

7. Add Graceful Startup Logic

Monitoring Kubernetes Liveness Probe Failed Error with CubeAPM

Step 1 — Install CubeAPM (Helm)

Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)

Step 3 — Collector Configs Focused on Liveness Probe Failures

Step 4 — Supporting Components (Optional)

Step 5 — Verification: What You Should See in CubeAPM

Example Alert Rules for Kubernetes Liveness Probe Failed Error

1. Liveness Probe Failures Spiking (Direct signal from kubelet)

2. Container Restarts Spike (Effect of failing liveness checks)

3. Persistent Probe Failures per Deployment

Conclusion

FAQs

Related Posts

Features

Resources

Links