A Kubernetes Liveness Probe Failed error occurs when a container stops responding to health checks, prompting Kubernetes to restart it. This can lead to repeated restarts, degraded service availability, and deployment delays. In fact 93% of organizations use Kubernetes in production or are actively evaluating it, meaning even minor probe misconfigurations can cascade into major outages in real-world environments.
CubeAPM helps teams detect, correlate, and resolve liveness probe failures faster by unifying Events, Metrics, Logs, and Rollouts under a single observability layer. Its OpenTelemetry-native ingestion automatically links failed probes with container restarts and node health metrics — helping you trace issues back to the root cause in seconds.
In this guide, we’ll explain what the Kubernetes Liveness Probe Failed error means, why it happens, how to fix it, and how to monitor it effectively using CubeAPM.
What is Kubernetes Liveness Probe Failed Error

A Kubernetes Liveness Probe is a health check that determines if a container is still running as expected. When it fails, Kubernetes assumes the container is unhealthy and automatically restarts it to restore service availability. This behavior is managed by the kubelet, which periodically checks the probe endpoint — typically through an HTTP request, TCP socket, or command execution.
If the container doesn’t respond within the configured timeoutSeconds or fails repeatedly beyond the failureThreshold, Kubernetes marks the probe as failed. When this happens, the Pod transitions to a CrashLoopBackOff or Restarting state, depending on the underlying cause.
Common misconfigurations — such as incorrect probe paths, ports, or delays — can cause the liveness probe to fail even when the application is healthy, leading to unnecessary restarts that increase downtime, CPU usage, and cluster noise.
In short, the “Liveness Probe Failed” error means Kubernetes is trying to heal the application automatically but may be restarting containers too aggressively due to configuration or readiness timing issues.
Why Kubernetes Liveness Probe Failed Error Happens
A failed liveness probe usually points to deeper issues in container health, startup behavior, or configuration. Here are the most frequent and impactful causes you’ll encounter in real-world clusters.
1. Incorrect Probe Path or Port
If the httpGet probe path or port doesn’t match the actual application endpoint, the kubelet receives 404 or connection refused responses. Even a minor mismatch — like using /health instead of /healthz — can trigger false failures. This is one of the most common reasons for unnecessary container restarts and degraded service availability.
Example:
Your app listens on /healthz but the probe targets /health.
This mismatch triggers probe failures, leading to unnecessary restarts.
Quick check:
kubectl describe pod <pod-name> | grep -A5 LivenessIf you see repeated “connection refused” or “HTTP 404” events, your probe path or port configuration likely doesn’t match the running application.
2. Insufficient Initial Delay
When probes begin checking before the application has fully initialized, they fail continuously during startup. This typically affects Java, Spring Boot, and database-heavy applications that require several seconds of boot time. The kubelet interprets these early probe failures as real crashes and restarts the Pod repeatedly.
Example:
A Spring Boot service with heavy dependency loading may take 20 seconds to boot, but the probe starts checking after 5 seconds.
Quick check:
kubectl get pod <pod-name> -o yaml | grep initialDelaySecondsIf you see a small delay value (e.g., 5 seconds) for a slow-starting app, increase it to allow proper initialization before the first check.
3. Application-Level Timeouts
Applications performing CPU-heavy or I/O-bound operations during probe execution may exceed the timeoutSeconds value. In this case, the container might still be healthy, but Kubernetes will restart it because the response took too long. Over time, this creates unnecessary restarts and elevated latency.
Example:
A probe hitting a database query endpoint takes longer than the configured 1-second timeout.
Quick check:
kubectl describe pod <pod-name> | grep -A3 "Liveness probe"If you see “probe failed due to timeout” messages, increase timeoutSeconds or optimize the health check endpoint for faster response.
4. Network or DNS Failures
Intermittent network drops, DNS latency, or misconfigured NetworkPolicies can cause otherwise healthy probes to fail. This is especially common when liveness checks reference service endpoints across namespaces or rely on DNS resolution under high load.
Example:
CoreDNS pods under CPU pressure cause slow lookups, making liveness probes fail intermittently.
Quick check:
kubectl get events --sort-by=.metadata.creationTimestamp | grep LivenessIf you see sporadic failures with “network unreachable” or “DNS lookup timeout” messages, verify CoreDNS health and review any restrictive network policies.
5. Resource Starvation (CPU or Memory Throttling)
When containers operate under tight resource limits, the kubelet may fail to process probe requests within the expected window. CPU throttling and memory pressure slow down application response times, leading Kubernetes to assume the container is unresponsive.
Example:
A container running close to its CPU limit delays probe responses, triggering false restarts.
Quick check:
kubectl top pod <pod-name>If you see high CPU or memory usage close to the defined limits, raise resource requests or limits to stabilize probe responses.
6. Misconfigured Command or Exec Probes
Exec-based probes fail when their commands return a non-zero exit code — often due to a missing binary, incorrect script path, or misused shell syntax. These issues can make healthy containers appear unhealthy, triggering constant restarts.
Example:
A probe uses cat /tmp/health but the file doesn’t exist in the container.
Quick check:
kubectl logs <pod> -c <container> --previousIf you see errors like “exec: not found” or “permission denied,” review the probe’s command definition and confirm that required binaries exist in the container image.
How to Fix Kubernetes Liveness Probe Failed Error
Fixing this issue requires validating each possible failure point — from probe configurations to resource limits and startup timings. Below are the most reliable fixes you can apply to prevent repetitive container restarts.
1. Fix Incorrect Probe Path or Port
If your probe endpoint or port doesn’t align with your running container, Kubernetes will keep failing the health check.
Validate the correct endpoint by checking the container logs or testing the endpoint locally.
Check:
kubectl describe pod <pod-name> | grep -A5 LivenessIf you see “HTTP 404” or “connection refused,” update the probe’s path or port in the deployment YAML to match your app’s actual health endpoint.
Fix:
kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","livenessProbe":{"httpGet":{"path":"/healthz","port":8080}}}]}}}}'2. Increase Initial Delay and Period Seconds
When an application starts slowly, the probe may fail before the container is ready.
Increase the probe’s initialDelaySeconds and periodSeconds so Kubernetes gives the container enough time to initialize before starting health checks.
Check:
kubectl get pod <pod-name> -o yaml | grep initialDelaySecondsIf you see a small value (like 5), raise it to at least 15–20 seconds for slow-starting apps such as Spring Boot or Node.js.
Fix:
kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","livenessProbe":{"initialDelaySeconds":20,"periodSeconds":10}}]}}}}'3. Adjust Timeout and Failure Threshold
If your endpoint is healthy but responds slowly, Kubernetes may restart it unnecessarily.
Extend the timeoutSeconds and failureThreshold values so short response spikes don’t trigger restarts.
Check:
kubectl get pod <pod-name> -o yaml | grep timeoutSecondsIf the timeout is too low (e.g., 1 second), increase it to 5–10 seconds based on response time trends.
Fix:
kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","livenessProbe":{"timeoutSeconds":10,"failureThreshold":5}}]}}}}'4. Validate Command or Exec Probes
If you’re using exec probes, verify the commands actually exist inside the image and execute properly.
A missing binary or incorrect file path will immediately fail the probe.
Check:
kubectl logs <pod> -c <container> --previousIf you see “exec: not found” or “permission denied,” fix the command or args in your Pod spec or Dockerfile.
Fix:
kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","livenessProbe":{"exec":{"command":["/bin/sh","-c","test -f /tmp/health || exit 1"]}}}]}}}}'5. Resolve Network and DNS Failures
Unstable network paths or slow DNS resolution can cause false probe failures.
Ensure your DNS pods (CoreDNS) are healthy, and network policies aren’t blocking inter-service communication.
Check:
kubectl get pods -n kube-system -l k8s-app=kube-dnsIf any CoreDNS pods are CrashLoopBackOff or Pending, restart them and check cluster-level network rules.
Fix:
kubectl rollout restart deployment coredns -n kube-system6. Adjust Resource Limits and Requests
Containers starved of CPU or memory may delay responses and fail probes.
Audit resource settings and scale up if you notice throttling or memory pressure.
Check:
kubectl top pod <pod-name>If usage is close to defined limits, increase resources.requests and resources.limits to stabilize probe behavior.
Fix:
kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","resources":{"requests":{"cpu":"500m","memory":"512Mi"},"limits":{"cpu":"1","memory":"1Gi"}}}]}}}}'7. Add Graceful Startup Logic
If your app needs to finish setup tasks before serving health checks, integrate a startup delay script or readiness gating logic.
This ensures liveness checks don’t begin too early and avoids restarts during initialization.
Check:
kubectl logs <pod-name>If you see probe failures immediately after deployment, add startup hooks or lifecycle delays so probes wait until the app is truly ready.
Fix:
kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","lifecycle":{"postStart":{"exec":{"command":["/bin/sh","-c","sleep 10"]}}}}]}}}}'Monitoring Kubernetes Liveness Probe Failed Error with CubeAPM
Fastest path to root cause: Correlate Events, Metrics, Logs, and Rollouts in a single view. CubeAPM automatically links Liveness probe failed events to container restarts, latency spikes, and recent deployments — giving you the full picture of why the probe failed and what changed before it happened.
Step 1 — Install CubeAPM (Helm)
Install the CubeAPM chart using Helm to enable full observability across your cluster.
helm repo add cubeapm https://charts.cubeapm.com && helm repo update cubeapm && helm show values cubeapm/cubeapm > values.yaml && helm install cubeapm cubeapm/cubeapm -f values.yamlUpgrade when needed:
helm upgrade cubeapm cubeapm/cubeapm -f values.yamlStep 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)
Run both modes for full coverage:
- DaemonSet → Collects node-level metrics, container logs, and kubelet signals.
- Deployment → Handles centralized event ingestion and aggregation.
Install both with Helm:
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts && helm repo update open-telemetry && helm install otel-collector-daemonset open-telemetry/opentelemetry-collector -f otel-collector-daemonset.yaml && helm install otel-collector-deployment open-telemetry/opentelemetry-collector -f otel-collector-deployment.yamlUpgrade:
helm upgrade otel-collector-daemonset open-telemetry/opentelemetry-collector -f otel-collector-daemonset.yaml && helm upgrade otel-collector-deployment open-telemetry/opentelemetry-collector -f otel-collector-deployment.yamlStep 3 — Collector Configs Focused on Liveness Probe Failures
- A) DaemonSet (per-node: metrics, logs, kubelet data)
Use this lightweight configuration to capture node metrics and container logs related to probe failures.
mode: daemonset
image:
repository: otel/opentelemetry-collector-contrib
presets:
kubernetesAttributes:
enabled: true
hostMetrics:
enabled: true
kubeletMetrics:
enabled: true
logsCollection:
enabled: true
storeCheckpoints: true
config:
processors:
batch: {}
exporters:
otlphttp/metrics:
metrics_endpoint: http://<cubeapm_endpoint>:3130/api/metrics/v1/save/otlp
otlphttp/logs:
logs_endpoint: http://<cubeapm_endpoint>:3130/api/logs/insert/opentelemetry/v1/logs
headers:
Cube-Stream-Fields: k8s.namespace.name,k8s.deployment.name,k8s.statefulset.name
- kubeletMetrics: Tracks probe execution counts and failure rates.
- logsCollection: Captures “Liveness probe failed” messages from container logs.
- otlphttp exporters: Send metrics and logs directly to CubeAPM for correlation.
- B) Deployment (cluster-wide events + rollouts)
This configuration collects Liveness probe failed, Killing container, and BackOff
mode: deployment
image:
repository: otel/opentelemetry-collector-contrib
config:
receivers:
k8s_events: {}
processors:
batch: {}
exporters:
otlphttp/logs:
logs_endpoint: http://<cubeapm_endpoint>:3130/api/logs/insert/opentelemetry/v1/logs
- k8s_events: Captures probe failure events and restart reasons from the API server.
- batch processor: Buffers and batches data efficiently.
- otlphttp/logs: Forwards structured events to CubeAPM’s central pipeline.
Step 4 — Supporting Components (Optional)
To enrich Kubernetes metrics, deploy kube-state-metrics for object-level insights.
helm install kube-state-metrics prometheus-community/kube-state-metricsThis adds metadata like container status, restart counts, and pod phase transitions to the CubeAPM dashboards.
Step 5 — Verification: What You Should See in CubeAPM
Once setup completes, verify that telemetry is flowing correctly. In CubeAPM, you should see:
- Events: “Liveness probe failed” and “Killing container” events appearing under the Events view.
- Metrics: Sudden increases in kube_pod_container_status_restarts_total.
- Logs: Error lines showing Liveness probe failed: with timestamp correlation.
- Restarts: Container restart spikes linked with deployment rollouts.
- Rollouts: Recent image updates or config changes preceding probe failures.
These correlated views let you pinpoint whether the failure stems from misconfiguration, resource starvation, or application unresponsiveness — all within a single timeline.
Example Alert Rules for Kubernetes Liveness Probe Failed Error
Alerting on liveness probe failures ensures you catch unhealthy containers before they trigger cascading restarts or downtime. Below are practical PromQL-based alert examples you can use directly in your Kubernetes monitoring setup.
1. Liveness Probe Failures Spiking (Direct signal from kubelet)
This alert fires when Kubernetes reports a sustained increase in failed liveness probes, indicating potential misconfiguration or unresponsive containers.
groups:
- name: liveness-probe-alerts
rules:
- alert: LivenessProbeFailuresHigh
expr: rate(prober_probe_total{probe_type="liveness",result="failure"}[5m]) > 0.5
for: 5m
labels:
severity: critical
annotations:
summary: "High liveness probe failure rate"
description: "Kubelet reports a sustained liveness probe failure rate (>0.5/min) for 5m. Check probe path, port, and resource usage."2. Container Restarts Spike (Effect of failing liveness checks)
Use this alert to detect frequent restarts caused by consecutive probe failures. It helps identify unstable pods before they degrade service reliability.
groups:
- name: container-restart-alerts
rules:
- alert: ContainerRestartsSpike
expr: increase(kube_pod_container_status_restarts_total[5m]) > 3
for: 5m
labels:
severity: warning
annotations:
summary: "Container restarts spiking"
description: "Container restarted more than 3 times in 5m. Likely caused by failing liveness probes. Correlate with Events and Rollouts in CubeAPM."3. Persistent Probe Failures per Deployment
This alert focuses on sustained liveness probe issues within specific workloads or namespaces, helping teams locate misbehaving deployments faster.
groups:
- name: workload-liveness-alerts
rules:
- alert: WorkloadLivenessProbeUnhealthy
expr: sum by (namespace, pod) (rate(prober_probe_total{probe_type="liveness",result="failure"}[10m])) > 0.2
for: 10m
labels:
severity: critical
annotations:
summary: "Workload liveness probe unhealthy"
description: "Continuous liveness probe failures for 10m in a specific workload. Check probe thresholds, initial delays, and readiness alignment."Conclusion
Liveness probe failures usually trace back to misaligned health endpoints, aggressive timeouts, slow startups, or resource pressure. Left unchecked, they trigger restart loops, delay rollouts, and erode service SLOs. The fixes are straightforward—validate probe paths/ports, tune initialDelaySeconds, timeoutSeconds, and failureThreshold, stabilize CPU/memory, and remove brittle exec checks.
CubeAPM reduces mean time to root cause by correlating Events, Metrics, Logs, and Rollouts in one timeline. You see probe failure events alongside restart spikes, latency on the health endpoint, and the exact deployment that introduced the change—so you can fix configuration drift before it becomes an outage.
Adopt the alert rules above, instrument with the OTEL Collector (DaemonSet + Deployment), and verify dashboards for restarts, probe errors, and rollout context. When liveness probes fail, CubeAPM turns noisy signals into clear next steps. Ready to harden your cluster? Let’s ship this playbook across your environments.






