CubeAPM
CubeAPM CubeAPM

Kubernetes Readiness Probe Failed Error: Pod Health Checks, Startup Delays & Real-Time Monitoring with CubeAPM

Author: | Published: November 1, 2025 | Kubernetes Errors

The Kubernetes “Readiness Probe Failed” error occurs when a container fails its readiness check, signaling that it’s not ready to serve traffic. This mechanism helps Kubernetes ensure only healthy pods receive requests. As over 93% of organizations run Kubernetes in production, probe failures can quickly cause pods to drop from load balancers, disrupt service availability, and trigger cascading timeouts across clusters.

CubeAPM enables teams to detect and resolve readiness probe issues faster by correlating container events, HTTP probe logs, and rollout data in real time. Powered by OpenTelemetry, it delivers full MELT observability—Metrics, Events, Logs, and Traces—to pinpoint failing endpoints, root causes, and service-level impact within seconds.

In this guide, we’ll explain what the Kubernetes Readiness Probe Failed error means, why it happens, how to fix it step by step, and how to monitor and prevent it effectively using CubeAPM.

What is Kubernetes Readiness Probe Failed Error

kubernetes readiness probe failed error

A Readiness Probe Failed error in Kubernetes indicates that a container inside a Pod has failed the configured readiness check. This means the application inside the container isn’t yet ready to handle requests — even though the container itself may be running. When the readiness probe fails repeatedly, Kubernetes removes the Pod’s IP from the associated Service’s endpoints, effectively stopping traffic from being routed to that Pod until it recovers.

Readiness probes are used to determine application-level health, not just container-level availability. A typical probe might hit an HTTP endpoint (e.g., /healthz), execute a command, or perform a TCP socket check. If the application takes longer than expected to initialize or fails to respond correctly to the probe, Kubernetes marks the Pod as “NotReady.”

This mechanism prevents clients from sending requests to pods that can’t serve them yet. However, frequent or prolonged readiness probe failures can cause:

  • Rolling updates to hang indefinitely.
  • Load balancers to drop healthy pods.
  • Temporary outages across services that depend on the affected Pod.
  • Alert fatigue if the cluster repeatedly toggles between Ready and NotReady states.

In short, this error is not just a sign of delayed startup — it’s an early warning signal that your application or infrastructure isn’t meeting the readiness contract Kubernetes expects.

Why Kubernetes Readiness Probe Failed Error Happens

A Readiness Probe Failed error usually points to an issue in how the probe is configured or how the application responds during startup or runtime. Below are the most common causes — each directly tied to real-world Kubernetes behavior.

1. Application Startup Takes Longer Than Probe Timeout

If the containerized application needs more time to initialize (e.g., warming caches, migrations, dependency checks) than the probe’s configured initialDelaySeconds or timeoutSeconds, the readiness probe will fail prematurely.

Quick check:

Bash
kubectl describe pod <pod-name>

 

If the events show repeated Readiness probe failed within a few seconds of pod creation, increase the probe’s delay or timeout settings.

2. Incorrect Readiness Endpoint or Path

Misconfigured HTTP paths are a frequent cause of readiness probe failures. For instance, the probe might target /health when the actual endpoint is /readyz, or the application may serve it on a different port.

Quick check:

Verify your container spec in the deployment file:

Bash
kubectl get deployment <deployment-name> -o yaml | grep readinessProbe -A5

Compare the httpGet path and port with your app’s actual configuration.

3. Authentication or Network Policies Blocking Probes

If the readiness endpoint requires authentication, tokens, or resides behind network policies (e.g., Calico or Cilium rules), kubelet probes may not have permission to reach it. This causes 401, 403, or timeout responses.

Quick check:

Inspect probe responses in the container logs:

Bash
kubectl logs <pod-name> | grep readiness

Look for authentication or connection denied errors.

4. Application Crash or Internal Error During Readiness Check

Sometimes, the container runs but the readiness endpoint depends on a downstream service (like a database) that isn’t available yet. This causes the probe to fail while the main app logs “dependency not reachable.”

Quick check:

Bash
kubectl logs <pod-name> -c <container-name> | grep error

Check for failed dependency connections or unhandled exceptions during startup.

5. Resource Pressure on Node or Pod

CPU throttling or memory pressure can delay readiness responses, causing the probe to time out. Kubernetes marks the Pod NotReady even though the app is technically fine.

Quick check:

Bash
kubectl top pod <pod-name>
Bash
kubectl top node <node-name>

If CPU or memory usage is near limits, adjust resource requests/limits to give the Pod enough headroom.

6. Misconfigured Readiness Probe Parameters

Improperly set probe values — such as failureThreshold, periodSeconds, or successThreshold — can cause transient network hiccups to be treated as persistent failures.

Quick check:

Examine the probe configuration:

Bash
kubectl get pod <pod-name> -o yaml | grep readinessProbe -A10

Ensure reasonable thresholds (e.g., 3–5 failures before marking NotReady).

How to Fix Kubernetes Readiness Probe Failed Error

Fixing a Readiness Probe Failed error involves validating both the application behavior and probe configuration. The goal is to ensure that the app’s startup, endpoint, and response timing align with the probe expectations. Below are the most effective fixes for this issue.

1. Extend Probe Delays for Slow-Starting Applications

If your app performs initialization tasks like cache warm-ups or dependency syncs, it might need more time before it can respond to readiness probes.

Increase the initialDelaySeconds and timeoutSeconds to give the app adequate time to become ready.

Fix:

Bash
kubectl edit deployment <deployment-name>

 Then adjust the probe:

YAML
readinessProbe:

  httpGet:

    path: /readyz

    port: 8080

  initialDelaySeconds: 30

  timeoutSeconds: 10

  periodSeconds: 5

  failureThreshold: 5

2. Verify the Correct Endpoint and Port

A mismatched endpoint or port often causes consistent readiness failures.

Check that the probe path and port exactly match your application’s configuration.

Fix:

Bash
kubectl get deployment <deployment-name> -o yaml | grep readinessProbe -A5

If you find a mismatch (e.g., /health instead of /readyz), update your deployment manifest to the correct endpoint and re-deploy:

Bash
kubectl apply -f deployment.yaml

3. Disable Authentication or Whitelist the Kubelet IP

If your readiness endpoint requires authentication or IP allow-listing, the kubelet probe will fail with 401 or 403 errors.

Ensure that readiness endpoints bypass authentication and can be reached from the node’s kubelet.

Fix:

Add a simple conditional in your app config to skip authentication for /readyz or /healthz endpoints.

Then restart your deployment:

Bash
kubectl rollout restart deployment <deployment-name>

4. Resolve Downstream Dependency Failures

If your readiness endpoint checks connections to external dependencies (like databases or message brokers), failures in those systems can cascade into readiness probe errors.

Make the endpoint resilient by returning a partial success (e.g., 200) for optional dependencies or implementing retries for critical ones.

Fix:

Inspect the logs for connection errors:

Bash
kubectl logs <pod-name> -c <container-name> | grep error

Then update your readiness logic to handle dependency timeouts gracefully.

5. Adjust Resource Requests and Limits

Resource exhaustion can delay responses to readiness checks, especially under CPU throttling or OOM pressure.

Ensure the Pod has enough resources to handle startup load.

Fix:

Bash
kubectl edit deployment <deployment-name>

Modify the Pod resources:

YAML
resources:

  requests:

    cpu: "250m"

    memory: "256Mi"

  limits:

    cpu: "500m"

    memory: "512Mi"

Then restart the deployment:

Bash
kubectl rollout restart deployment <deployment-name>

6. Tune Probe Thresholds to Prevent Flapping

If the probe oscillates between success and failure due to transient latency, increase the failureThreshold and reduce the periodSeconds for better tolerance.

This prevents unnecessary Pod restarts and false alerts.

Fix:

YAML
readinessProbe:

  httpGet:

    path: /ready

    port: 8080

  failureThreshold: 6

  successThreshold: 2

  periodSeconds: 10

These steps collectively help stabilize Pod readiness by aligning the probe configuration with the actual runtime characteristics of your application.

Monitoring Kubernetes Readiness Probe Failed Error with CubeAPM

Detecting and fixing a readiness probe issue is only half the battle — the real challenge is catching it before it disrupts live traffic. CubeAPM helps DevOps teams trace these failures in real time by correlating the four main Kubernetes signal streams — Events, Metrics, Logs, and Rollouts — to identify exactly why and when a Pod transitions to NotReady.

By combining probe response latency, container restart trends, and deployment history, CubeAPM provides end-to-end visibility into readiness degradation, so you can fix configuration or startup issues before users notice.

Step 1 — Install CubeAPM (Helm)

Install CubeAPM’s OpenTelemetry-based agent in your cluster.

Bash
helm repo add cubeapm https://charts.cubeapm.com && helm install cubeapm cubeapm/cubeapm-agent --namespace cubeapm --create-namespace

To upgrade later:

Bash
helm upgrade cubeapm cubeapm/cubeapm-agent --namespace cubeapm

If you’re customizing probe monitoring thresholds or resource collection, you can edit values in values.yaml before installation.

Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)

CubeAPM recommends a dual-collector setup:

  • DaemonSet for scraping node-level and kubelet events (e.g., readiness failures).
  • Deployment for central pipeline management and exporting traces, logs, and metrics.

DaemonSet install:

Bash
helm install cubeapm-otel-ds cubeapm/otel-collector --namespace cubeapm --set mode=daemonset

Deployment install:

Bash
helm install cubeapm-otel cubeapm/otel-collector --namespace cubeapm --set mode=deployment

Step 3 — Collector Configs Focused on Readiness Probe Failures

DaemonSet configuration (readiness events focus):

YAML
receivers:

  kubeletstats:

    collection_interval: 30s

  k8s_events:

    namespaces: ["default"]

processors:

  batch:

exporters:

  otlp:

    endpoint: cubeapm:4317

service:

  pipelines:

    logs:

      receivers: [k8s_events]

      processors: [batch]

      exporters: [otlp]
  • k8s_events captures probe failure events from kubelet.
  • batch ensures efficient log export.
  • otlp sends enriched data to CubeAPM’s backend.

Deployment configuration (metrics + rollouts correlation):

YAML
receivers:

  prometheus:

    config:

      scrape_configs:

        - job_name: 'kube-state'

          static_configs:

            - targets: ['kube-state-metrics:8080']

processors:

  resource:

    attributes:

      - key: service.name

        value: readiness_probe

exporters:

  otlp:

    endpoint: cubeapm:4317

service:

  pipelines:

    metrics:

      receivers: [prometheus]

      processors: [resource]

      exporters: [otlp]
  • prometheus pulls probe metrics and container readiness states.
  • resource tags each metric with service metadata for correlation.
  • otlp exports structured data to CubeAPM.

Step 4 — Supporting Components

For cluster-wide health insights, install kube-state-metrics to capture pod readiness, restarts, and deployment health:

Bash
helm install kube-state-metrics prometheus-community/kube-state-metrics --namespace cubeapm

Step 5 — Verification (What You Should See in CubeAPM)

After deploying CubeAPM, confirm observability coverage with this checklist:

  • Events: You can see ReadinessProbeFailed events under the Kubernetes Events panel.
  • Metrics: Dashboards show spikes in kube_pod_container_status_ready toggling from 1 → 0.
  • Logs: Container logs display HTTP probe failures or timeouts.
  • Restarts: Readiness issues correlated with restart_count trends.
  • Rollouts: The Deployment Timeline highlights which rollout or image triggered the failure.

CubeAPM’s correlated dashboards let you pivot directly from an event to its cause — whether it’s a bad endpoint, resource starvation, or dependency timeout — reducing mean time to detection (MTTD) and recovery (MTTR).

Example Alert Rules for Kubernetes Readiness Probe Failed Error

1. Pod Not Ready for Sustained Period (hard outage)

Trigger when any container stays NotReady beyond a safe window—useful to catch genuine outages rather than brief warm-ups.

YAML
apiVersion: monitoring.coreos.com/v1

kind: PrometheusRule

metadata:

  name: readiness-hard-outage

  namespace: cubeapm

spec:

  groups:

  - name: readiness.probes

    rules:

    - alert: K8sPodContainerNotReadySustained

      expr: max by (namespace,pod,container) (1 - max_over_time(kube_pod_container_status_ready{job="kube-state-metrics"}[1m])) == 1

      for: 5m

      labels:

        severity: critical

      annotations:

        summary: "Pod {{ $labels.pod }} ({{ $labels.namespace }}) is NotReady"

        description: "Container {{ $labels.container }} has failed readiness for >5m. Investigate endpoint/port, timeouts, deps, or resources."

2. Readiness Flapping (frequent toggles)

Alert on toggle storms where readiness flips repeatedly—common with tight thresholds or intermittent deps.

YAML
apiVersion: monitoring.coreos.com/v1

kind: PrometheusRule

metadata:

  name: readiness-flapping

  namespace: cubeapm

spec:

  groups:

  - name: readiness.probes

    rules:

    - alert: K8sPodReadinessFlapping

      expr: changes(max by (namespace,pod,container) (kube_pod_container_status_ready{job="kube-state-metrics"}))[10m:1m] > 6

      for: 2m

      labels:

        severity: warning

      annotations:

        summary: "Readiness flapping for {{ $labels.pod }} ({{ $labels.namespace }})"

        description: "Readiness toggled >6 times in 10m. Consider increasing failureThreshold/timeoutSeconds or fixing intermittent deps."

3. Deployment-Level Readiness Regression

Page when too many replicas in a deployment are NotReady—great during rollouts to catch bad images/config quickly.

YAML
apiVersion: monitoring.coreos.com/v1

kind: PrometheusRule

metadata:

  name: readiness-deployment-regression

  namespace: cubeapm

spec:

  groups:

  - name: readiness.probes

    rules:

    - alert: K8sDeploymentReadinessRegression

      expr: |

        (sum by (namespace, deployment) (kube_deployment_status_replicas{job="kube-state-metrics"}) -

         sum by (namespace, deployment) (kube_deployment_status_ready_replicas{job="kube-state-metrics"}))

        /

        clamp_min(sum by (namespace, deployment) (kube_deployment_status_replicas{job="kube-state-metrics"}), 1)

        > 0.3

      for: 7m

      labels:

        severity: critical

      annotations:

        summary: "Readiness regression in {{ $labels.deployment }} ({{ $labels.namespace }})"

        description: ">30% replicas NotReady for >7m. Suspect bad rollout, wrong readiness path/port, or resource pressure."

4. Surge in Readiness Probe Failures (event-driven)

Use Kubernetes Events to catch a spike in ReadinessProbeFailed even if replicas remain mostly available.

YAML
apiVersion: monitoring.coreos.com/v1

kind: PrometheusRule

metadata:

  name: readiness-event-spike

  namespace: cubeapm

spec:

  groups:

  - name: readiness.probes

    rules:

    - alert: K8sReadinessProbeFailedEventsSpike

      expr: increase(kubernetes_events_total{reason="ReadinessProbeFailed"}[10m]) > 10

      for: 0m

      labels:

        severity: warning

      annotations:

        summary: "Spike in ReadinessProbeFailed events"

        description: "More than 10 readiness failures recorded in 10m. Check app endpoint, auth bypass for probes, and dependency health."

5. Readiness Timeout Suspected from CPU Throttling

Correlate NotReady with CPU saturation that slows handlers and causes probe timeouts.

YAML
apiVersion: monitoring.coreos.com/v1

kind: PrometheusRule

metadata:

  name: readiness-cpu-throttle

  namespace: cubeapm

spec:

  groups:

  - name: readiness.probes

    rules:

    - alert: K8sReadinessLikelyCpuThrottling

      expr: |

        max by (namespace,pod,container) (1 - kube_pod_container_status_ready{job="kube-state-metrics"}) == 1

        and rate(container_cpu_cfs_throttled_seconds_total{container!=""}[5m]) > 0.2

      for: 5m

      labels:

        severity: warning

      annotations:

        summary: "NotReady with CPU throttling for {{ $labels.pod }} ({{ $labels.namespace }})"

        description: "Readiness failing alongside CPU throttling. Consider raising requests/limits or reducing init work."

Conclusion

A “Readiness Probe Failed” status means the pod is alive but not ready to serve—often due to timing, endpoint mismatch, dependency lag, or resource pressure. Left unchecked, it can stall rollouts, drop endpoints from Services, and cascade into user-visible errors.

With CubeAPM, you get correlated Events, Metrics, Logs, and Rollouts in one view, so you can pinpoint whether failures come from bad probe config, slow startups, or throttled CPUs—then fix them fast. OpenTelemetry-native pipelines and Kubernetes-aware dashboards shorten MTTD/MTTR and keep production traffic stable.

Adopt a proactive stance: tune probes, right-size resources, and monitor probe latency and failures continuously with CubeAPM to prevent flapping and protect SLOs.

×