The Kubernetes “Readiness Probe Failed” error occurs when a container fails its readiness check, signaling that it’s not ready to serve traffic. This mechanism helps Kubernetes ensure only healthy pods receive requests. As over 93% of organizations run Kubernetes in production, probe failures can quickly cause pods to drop from load balancers, disrupt service availability, and trigger cascading timeouts across clusters.
CubeAPM enables teams to detect and resolve readiness probe issues faster by correlating container events, HTTP probe logs, and rollout data in real time. Powered by OpenTelemetry, it delivers full MELT observability—Metrics, Events, Logs, and Traces—to pinpoint failing endpoints, root causes, and service-level impact within seconds.
In this guide, we’ll explain what the Kubernetes Readiness Probe Failed error means, why it happens, how to fix it step by step, and how to monitor and prevent it effectively using CubeAPM.
What is Kubernetes Readiness Probe Failed Error

A Readiness Probe Failed error in Kubernetes indicates that a container inside a Pod has failed the configured readiness check. This means the application inside the container isn’t yet ready to handle requests — even though the container itself may be running. When the readiness probe fails repeatedly, Kubernetes removes the Pod’s IP from the associated Service’s endpoints, effectively stopping traffic from being routed to that Pod until it recovers.
Readiness probes are used to determine application-level health, not just container-level availability. A typical probe might hit an HTTP endpoint (e.g., /healthz), execute a command, or perform a TCP socket check. If the application takes longer than expected to initialize or fails to respond correctly to the probe, Kubernetes marks the Pod as “NotReady.”
This mechanism prevents clients from sending requests to pods that can’t serve them yet. However, frequent or prolonged readiness probe failures can cause:
- Rolling updates to hang indefinitely.
- Load balancers to drop healthy pods.
- Temporary outages across services that depend on the affected Pod.
- Alert fatigue if the cluster repeatedly toggles between Ready and NotReady states.
In short, this error is not just a sign of delayed startup — it’s an early warning signal that your application or infrastructure isn’t meeting the readiness contract Kubernetes expects.
Why Kubernetes Readiness Probe Failed Error Happens
A Readiness Probe Failed error usually points to an issue in how the probe is configured or how the application responds during startup or runtime. Below are the most common causes — each directly tied to real-world Kubernetes behavior.
1. Application Startup Takes Longer Than Probe Timeout
If the containerized application needs more time to initialize (e.g., warming caches, migrations, dependency checks) than the probe’s configured initialDelaySeconds or timeoutSeconds, the readiness probe will fail prematurely.
Quick check:
kubectl describe pod <pod-name>
If the events show repeated Readiness probe failed within a few seconds of pod creation, increase the probe’s delay or timeout settings.
2. Incorrect Readiness Endpoint or Path
Misconfigured HTTP paths are a frequent cause of readiness probe failures. For instance, the probe might target /health when the actual endpoint is /readyz, or the application may serve it on a different port.
Quick check:
Verify your container spec in the deployment file:
kubectl get deployment <deployment-name> -o yaml | grep readinessProbe -A5Compare the httpGet path and port with your app’s actual configuration.
3. Authentication or Network Policies Blocking Probes
If the readiness endpoint requires authentication, tokens, or resides behind network policies (e.g., Calico or Cilium rules), kubelet probes may not have permission to reach it. This causes 401, 403, or timeout responses.
Quick check:
Inspect probe responses in the container logs:
kubectl logs <pod-name> | grep readinessLook for authentication or connection denied errors.
4. Application Crash or Internal Error During Readiness Check
Sometimes, the container runs but the readiness endpoint depends on a downstream service (like a database) that isn’t available yet. This causes the probe to fail while the main app logs “dependency not reachable.”
Quick check:
kubectl logs <pod-name> -c <container-name> | grep errorCheck for failed dependency connections or unhandled exceptions during startup.
5. Resource Pressure on Node or Pod
CPU throttling or memory pressure can delay readiness responses, causing the probe to time out. Kubernetes marks the Pod NotReady even though the app is technically fine.
Quick check:
kubectl top pod <pod-name>kubectl top node <node-name>If CPU or memory usage is near limits, adjust resource requests/limits to give the Pod enough headroom.
6. Misconfigured Readiness Probe Parameters
Improperly set probe values — such as failureThreshold, periodSeconds, or successThreshold — can cause transient network hiccups to be treated as persistent failures.
Quick check:
Examine the probe configuration:
kubectl get pod <pod-name> -o yaml | grep readinessProbe -A10Ensure reasonable thresholds (e.g., 3–5 failures before marking NotReady).
How to Fix Kubernetes Readiness Probe Failed Error
Fixing a Readiness Probe Failed error involves validating both the application behavior and probe configuration. The goal is to ensure that the app’s startup, endpoint, and response timing align with the probe expectations. Below are the most effective fixes for this issue.
1. Extend Probe Delays for Slow-Starting Applications
If your app performs initialization tasks like cache warm-ups or dependency syncs, it might need more time before it can respond to readiness probes.
Increase the initialDelaySeconds and timeoutSeconds to give the app adequate time to become ready.
Fix:
kubectl edit deployment <deployment-name>Then adjust the probe:
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 30
timeoutSeconds: 10
periodSeconds: 5
failureThreshold: 52. Verify the Correct Endpoint and Port
A mismatched endpoint or port often causes consistent readiness failures.
Check that the probe path and port exactly match your application’s configuration.
Fix:
kubectl get deployment <deployment-name> -o yaml | grep readinessProbe -A5If you find a mismatch (e.g., /health instead of /readyz), update your deployment manifest to the correct endpoint and re-deploy:
kubectl apply -f deployment.yaml3. Disable Authentication or Whitelist the Kubelet IP
If your readiness endpoint requires authentication or IP allow-listing, the kubelet probe will fail with 401 or 403 errors.
Ensure that readiness endpoints bypass authentication and can be reached from the node’s kubelet.
Fix:
Add a simple conditional in your app config to skip authentication for /readyz or /healthz endpoints.
Then restart your deployment:
kubectl rollout restart deployment <deployment-name>4. Resolve Downstream Dependency Failures
If your readiness endpoint checks connections to external dependencies (like databases or message brokers), failures in those systems can cascade into readiness probe errors.
Make the endpoint resilient by returning a partial success (e.g., 200) for optional dependencies or implementing retries for critical ones.
Fix:
Inspect the logs for connection errors:
kubectl logs <pod-name> -c <container-name> | grep errorThen update your readiness logic to handle dependency timeouts gracefully.
5. Adjust Resource Requests and Limits
Resource exhaustion can delay responses to readiness checks, especially under CPU throttling or OOM pressure.
Ensure the Pod has enough resources to handle startup load.
Fix:
kubectl edit deployment <deployment-name>Modify the Pod resources:
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"Then restart the deployment:
kubectl rollout restart deployment <deployment-name>6. Tune Probe Thresholds to Prevent Flapping
If the probe oscillates between success and failure due to transient latency, increase the failureThreshold and reduce the periodSeconds for better tolerance.
This prevents unnecessary Pod restarts and false alerts.
Fix:
readinessProbe:
httpGet:
path: /ready
port: 8080
failureThreshold: 6
successThreshold: 2
periodSeconds: 10These steps collectively help stabilize Pod readiness by aligning the probe configuration with the actual runtime characteristics of your application.
Monitoring Kubernetes Readiness Probe Failed Error with CubeAPM
Detecting and fixing a readiness probe issue is only half the battle — the real challenge is catching it before it disrupts live traffic. CubeAPM helps DevOps teams trace these failures in real time by correlating the four main Kubernetes signal streams — Events, Metrics, Logs, and Rollouts — to identify exactly why and when a Pod transitions to NotReady.
By combining probe response latency, container restart trends, and deployment history, CubeAPM provides end-to-end visibility into readiness degradation, so you can fix configuration or startup issues before users notice.
Step 1 — Install CubeAPM (Helm)
Install CubeAPM’s OpenTelemetry-based agent in your cluster.
helm repo add cubeapm https://charts.cubeapm.com && helm install cubeapm cubeapm/cubeapm-agent --namespace cubeapm --create-namespaceTo upgrade later:
helm upgrade cubeapm cubeapm/cubeapm-agent --namespace cubeapmIf you’re customizing probe monitoring thresholds or resource collection, you can edit values in values.yaml before installation.
Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)
CubeAPM recommends a dual-collector setup:
- DaemonSet for scraping node-level and kubelet events (e.g., readiness failures).
- Deployment for central pipeline management and exporting traces, logs, and metrics.
DaemonSet install:
helm install cubeapm-otel-ds cubeapm/otel-collector --namespace cubeapm --set mode=daemonsetDeployment install:
helm install cubeapm-otel cubeapm/otel-collector --namespace cubeapm --set mode=deploymentStep 3 — Collector Configs Focused on Readiness Probe Failures
DaemonSet configuration (readiness events focus):
receivers:
kubeletstats:
collection_interval: 30s
k8s_events:
namespaces: ["default"]
processors:
batch:
exporters:
otlp:
endpoint: cubeapm:4317
service:
pipelines:
logs:
receivers: [k8s_events]
processors: [batch]
exporters: [otlp]- k8s_events captures probe failure events from kubelet.
- batch ensures efficient log export.
- otlp sends enriched data to CubeAPM’s backend.
Deployment configuration (metrics + rollouts correlation):
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'kube-state'
static_configs:
- targets: ['kube-state-metrics:8080']
processors:
resource:
attributes:
- key: service.name
value: readiness_probe
exporters:
otlp:
endpoint: cubeapm:4317
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [resource]
exporters: [otlp]- prometheus pulls probe metrics and container readiness states.
- resource tags each metric with service metadata for correlation.
- otlp exports structured data to CubeAPM.
Step 4 — Supporting Components
For cluster-wide health insights, install kube-state-metrics to capture pod readiness, restarts, and deployment health:
helm install kube-state-metrics prometheus-community/kube-state-metrics --namespace cubeapmStep 5 — Verification (What You Should See in CubeAPM)
After deploying CubeAPM, confirm observability coverage with this checklist:
- Events: You can see ReadinessProbeFailed events under the Kubernetes Events panel.
- Metrics: Dashboards show spikes in kube_pod_container_status_ready toggling from 1 → 0.
- Logs: Container logs display HTTP probe failures or timeouts.
- Restarts: Readiness issues correlated with restart_count trends.
- Rollouts: The Deployment Timeline highlights which rollout or image triggered the failure.
CubeAPM’s correlated dashboards let you pivot directly from an event to its cause — whether it’s a bad endpoint, resource starvation, or dependency timeout — reducing mean time to detection (MTTD) and recovery (MTTR).
Example Alert Rules for Kubernetes Readiness Probe Failed Error
1. Pod Not Ready for Sustained Period (hard outage)
Trigger when any container stays NotReady beyond a safe window—useful to catch genuine outages rather than brief warm-ups.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: readiness-hard-outage
namespace: cubeapm
spec:
groups:
- name: readiness.probes
rules:
- alert: K8sPodContainerNotReadySustained
expr: max by (namespace,pod,container) (1 - max_over_time(kube_pod_container_status_ready{job="kube-state-metrics"}[1m])) == 1
for: 5m
labels:
severity: critical
annotations:
summary: "Pod {{ $labels.pod }} ({{ $labels.namespace }}) is NotReady"
description: "Container {{ $labels.container }} has failed readiness for >5m. Investigate endpoint/port, timeouts, deps, or resources."2. Readiness Flapping (frequent toggles)
Alert on toggle storms where readiness flips repeatedly—common with tight thresholds or intermittent deps.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: readiness-flapping
namespace: cubeapm
spec:
groups:
- name: readiness.probes
rules:
- alert: K8sPodReadinessFlapping
expr: changes(max by (namespace,pod,container) (kube_pod_container_status_ready{job="kube-state-metrics"}))[10m:1m] > 6
for: 2m
labels:
severity: warning
annotations:
summary: "Readiness flapping for {{ $labels.pod }} ({{ $labels.namespace }})"
description: "Readiness toggled >6 times in 10m. Consider increasing failureThreshold/timeoutSeconds or fixing intermittent deps."3. Deployment-Level Readiness Regression
Page when too many replicas in a deployment are NotReady—great during rollouts to catch bad images/config quickly.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: readiness-deployment-regression
namespace: cubeapm
spec:
groups:
- name: readiness.probes
rules:
- alert: K8sDeploymentReadinessRegression
expr: |
(sum by (namespace, deployment) (kube_deployment_status_replicas{job="kube-state-metrics"}) -
sum by (namespace, deployment) (kube_deployment_status_ready_replicas{job="kube-state-metrics"}))
/
clamp_min(sum by (namespace, deployment) (kube_deployment_status_replicas{job="kube-state-metrics"}), 1)
> 0.3
for: 7m
labels:
severity: critical
annotations:
summary: "Readiness regression in {{ $labels.deployment }} ({{ $labels.namespace }})"
description: ">30% replicas NotReady for >7m. Suspect bad rollout, wrong readiness path/port, or resource pressure."4. Surge in Readiness Probe Failures (event-driven)
Use Kubernetes Events to catch a spike in ReadinessProbeFailed even if replicas remain mostly available.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: readiness-event-spike
namespace: cubeapm
spec:
groups:
- name: readiness.probes
rules:
- alert: K8sReadinessProbeFailedEventsSpike
expr: increase(kubernetes_events_total{reason="ReadinessProbeFailed"}[10m]) > 10
for: 0m
labels:
severity: warning
annotations:
summary: "Spike in ReadinessProbeFailed events"
description: "More than 10 readiness failures recorded in 10m. Check app endpoint, auth bypass for probes, and dependency health."5. Readiness Timeout Suspected from CPU Throttling
Correlate NotReady with CPU saturation that slows handlers and causes probe timeouts.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: readiness-cpu-throttle
namespace: cubeapm
spec:
groups:
- name: readiness.probes
rules:
- alert: K8sReadinessLikelyCpuThrottling
expr: |
max by (namespace,pod,container) (1 - kube_pod_container_status_ready{job="kube-state-metrics"}) == 1
and rate(container_cpu_cfs_throttled_seconds_total{container!=""}[5m]) > 0.2
for: 5m
labels:
severity: warning
annotations:
summary: "NotReady with CPU throttling for {{ $labels.pod }} ({{ $labels.namespace }})"
description: "Readiness failing alongside CPU throttling. Consider raising requests/limits or reducing init work."Conclusion
A “Readiness Probe Failed” status means the pod is alive but not ready to serve—often due to timing, endpoint mismatch, dependency lag, or resource pressure. Left unchecked, it can stall rollouts, drop endpoints from Services, and cascade into user-visible errors.
With CubeAPM, you get correlated Events, Metrics, Logs, and Rollouts in one view, so you can pinpoint whether failures come from bad probe config, slow startups, or throttled CPUs—then fix them fast. OpenTelemetry-native pipelines and Kubernetes-aware dashboards shorten MTTD/MTTR and keep production traffic stable.
Adopt a proactive stance: tune probes, right-size resources, and monitor probe latency and failures continuously with CubeAPM to prevent flapping and protect SLOs.






