Kubernetes Startup Probe Failed Error Explained: Pod Initialization, Probe Timeouts & Restart Loops

Author: Vijay Aggarwal
Category: Kubernetes Errors
Published Date: November 1, 2025
Last updated: January 20th, 2026

The Kubernetes “Startup Probe Failed” error occurs when a container doesn’t signal successful startup within the configured threshold — often due to slow initialization, dependency delays, or misconfigured probe settings. When this happens, Kubernetes keeps restarting the container until it hits the failure limit, leading to CrashLoopBackOff states, delayed rollouts, and wasted cluster resources.

CubeAPM helps resolve these issues faster by correlating events, metrics, logs, and rollout data in one place. With its OpenTelemetry-native instrumentation and real-time dashboards, CubeAPM identifies the exact cause of startup probe failures—whether configuration errors or dependency bottlenecks—before they impact production environments.

In this guide, we’ll explain what the Startup Probe Failed error means, why it happens, how to fix it, and how to monitor it effectively using CubeAPM.

What is Kubernetes Startup Probe Failed Error

A startup probe in Kubernetes is designed to check whether an application within a container has started successfully. Unlike liveness or readiness probes, the startup probe is used only during container initialization — it gives slow-starting applications extra time before Kubernetes begins performing health checks.

When the Startup Probe Failed error appears, it means the probe did not receive a successful response (such as an HTTP 200 or TCP connection) within the defined number of attempts or timeout period. As a result, Kubernetes assumes the container failed to start and restarts it automatically, potentially creating a restart loop if the issue persists.

This error is common in applications that:

Require significant initialization time like large Java or Spring Boot apps.
Depend on external services like databases or message queues before fully starting.
Use incorrect probe configurations — such as wrong ports, endpoints, or timeouts.

The startup probe acts as a safeguard to prevent Kubernetes from prematurely marking a container as healthy. However, when misconfigured, it can lead to false negatives — causing healthy applications to restart unnecessarily and affecting deployment stability.

Why Kubernetes Startup Probe Failed Error Happens

1. Slow Application Initialization

Some applications require extra time to start due to heavy frameworks or dependency loading. When the startup probe’s timeout is too short, Kubernetes assumes the container failed and restarts it prematurely.

Example: A Spring Boot app initializing large caches or external connections may take 60 seconds to start, but the probe timeout is set to 30 seconds.

Quick check:

Bash

kubectl describe pod <pod-name>

Look for repeated Startup probe failed messages and restart attempts in the Events section.

2. Incorrect Probe Endpoint or Port

The probe might be pointing to a non-existent path or wrong port inside the container, leading to failed HTTP or TCP checks.

Example: The probe checks /startup on port 8080, but the app only serves /health on port 8081.

Quick check:

Bash

kubectl get pod <pod-name> -o yaml | grep startupProbe -A5

Confirm that the httpGet or tcpSocket values match your container’s actual configuration.

3. Dependencies Not Yet Available

If the application depends on services like a database, cache, or message broker that aren’t ready when it starts, the probe fails even though the app would succeed once those services are available.

Example: A containerized API failing startup because the MySQL service hasn’t finished initializing.

Quick check:

Bash

kubectl logs <pod-name> -c <container-name>

Look for connection timeout or “service unavailable” errors during startup.

4. Resource Starvation During Boot

Low CPU or memory limits can delay initialization, especially for resource-heavy apps. The container might start too slowly, causing the probe to time out repeatedly.

Example: A JVM process with 256 MiB memory limit struggling to load dependencies.

Quick check:

Bash

kubectl top pod <pod-name>

Check for CPU throttling or high memory usage during startup.

5. Failing Entrypoint or Command

If the container’s entrypoint or command points to a non-existent binary or script, the process exits before the probe can succeed.

Example: Entrypoint is set to /app/start.sh but the script isn’t included in the image.

Quick check:

Bash

kubectl logs <pod-name> -c <container-name> --previous

If you see exec: not found or similar errors, the startup command is invalid.

How to Fix Kubernetes Startup Probe Failed Error

1. Fix invalid entrypoint or command

If the container’s command/args point to a missing binary or typoed script, the process exits before the probe can pass. Keep commands minimal and verify they exist inside the image.

Quick check

Bash

kubectl logs <pod> -c <container> --previous | tail -n 50

Fix

Bash

kubectl patch deploy <deploy> --type=json -p='[{"op":"remove","path":"/spec/template/spec/containers/0/command"}]'

2. Increase the startup probe’s time budget

Slow starters (JVMs, big frameworks, cold caches) may need longer than your current thresholds. Increase failureThreshold, timeoutSeconds, and if needed initialDelaySeconds.

Quick check

Bash

kubectl describe pod <pod> | grep -A3 "Startup probe" -n

Fix

Bash

kubectl patch deploy <deploy> --type=merge -p='{"spec":{"template":{"spec":{"containers":[{"name":"<container>","startupProbe":{"httpGet":{"path":"/startup","port":8080},"initialDelaySeconds":10,"periodSeconds":10,"timeoutSeconds":5,"failureThreshold":12}}]}}}}'

3. Correct probe path/port

If the probe hits the wrong path or port, Kubernetes never gets a valid response. Align the probe with the actual in-app endpoint.

Quick check

Bash

kubectl get pod <pod> -o yaml | grep -A6 startupProbe

Fix

Bash

kubectl patch deploy <deploy> --type=merge -p='{"spec":{"template":{"spec":{"containers":[{"name":"<container>","startupProbe":{"httpGet":{"path":"/health/startup","port":8081},"periodSeconds":10,"timeoutSeconds":3,"failureThreshold":10}}]}}}}'

4. Gate on dependencies (DB, cache, broker)

If the app fails until Postgres/Redis/Kafka is reachable, the probe fails even though the app is fine once deps are up. Add an initContainer to wait for deps.

Quick check

Bash

kubectl logs <pod> -c <container> | egrep -i "timeout|connection refused|unavailable"

Fix

Bash

kubectl patch deploy <deploy> --type=merge -p='{"spec":{"template":{"spec":{"initContainers":[{"name":"wait-deps","image":"busybox:1.36","command":["/bin/sh","-c","until nc -zv postgres.default.svc.cluster.local 5432; do sleep 2; done"]}]}}}}'

5. Relax resource pressure during boot

Under-provisioned CPU/memory slows cold starts; throttling or OOM kills the process before the probe passes. Temporarily raise requests/limits during startup.

Quick check

Bash

kubectl top pod <pod> --containers | awk 'NR==1 || /<container>/'

Fix

Bash

kubectl patch deploy <deploy> --type=merge -p='{"spec":{"template":{"spec":{"containers":[{"name":"<container>","resources":{"requests":{"cpu":"500m","memory":"512Mi"},"limits":{"cpu":"1","memory":"1Gi"}}}]}}}}'

6. Add a dedicated startup endpoint/handler

Generic /health may be “unhealthy” while caches warm or migrations run. Expose a /startup route that returns OK only after critical init completes, then switch the probe to it.

Quick check

Bash

kubectl port-forward deploy/<deploy> 18080:8080 >/dev/null 2>&1 & curl -sS http://127.0.0.1:18080/startup; kill %1

Fix

Bash

kubectl patch deploy <deploy> --type=merge -p='{"spec":{"template":{"spec":{"containers":[{"name":"<container>","startupProbe":{"httpGet":{"path":"/startup","port":8080},"periodSeconds":10,"timeoutSeconds":3,"failureThreshold":10}}]}}}}'

7. Temporarily disable the startup probe to debug

If aggressive restarts block investigation, disable the probe briefly to collect logs/attach a shell, then restore with proper thresholds.

Quick check

Bash

kubectl describe pod <pod> | grep -n "restarts" -i

Fix

Bash

kubectl patch deploy <deploy> --type=json -p='[{"op":"remove","path":"/spec/template/spec/containers/0/startupProbe"}]'

Monitoring Kubernetes Startup Probe Failed Error with CubeAPM

The fastest way to identify why a startup probe failed is to correlate Kubernetes Events, Metrics, Logs, and Rollouts together. CubeAPM collects and links these signals automatically using OpenTelemetry-based collectors. This enables teams to see what failed, why it failed, and which deployment caused it — all in one unified view.

Step 1 — Install CubeAPM (Helm)

Install CubeAPM using Helm and load default configurations into a values.yaml file for easy customization.

Bash

helm repo add cubeapm https://charts.cubeapm.com && helm repo update cubeapm && helm show values cubeapm/cubeapm > values.yaml && helm upgrade --install cubeapm cubeapm/cubeapm -f values.yaml --namespace cubeapm --create-namespace

Tip: Keep API keys and exporter endpoints inside values.yaml for consistent redeployments.

Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)

The DaemonSet collects pod-level telemetry (logs, kubelet stats, and events), while the Deployment acts as a central pipeline for processing and exporting to CubeAPM.

Bash

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts && helm repo update open-telemetry && helm upgrade --install otel-agent open-telemetry/opentelemetry-collector --set mode=daemonset --namespace monitoring --create-namespace && helm upgrade --install otel-gateway open-telemetry/opentelemetry-collector --set mode=deployment --namespace monitoring

Why both: The DaemonSet captures metrics and logs close to the source, while the Deployment enriches and exports them for correlation inside CubeAPM.

Step 3 — Collector Configurations Focused on Startup Probe Failures

3A. DaemonSet (node/pod collectors) — capture Events, Kubelet metrics, and container Logs

YAML

receivers:
  k8s_events: {}
  kubeletstats:
    collection_interval: 15s
    auth_type: serviceAccount
    endpoint: https://${KUBE_NODE_NAME}:10250
    k8s_api_config:
      auth_type: serviceAccount
  filelog:
    include: [/var/log/containers/*.log]
    start_at: beginning
    operators:
      - type: regex_parser
        regex: '^(?P<ts>[^ ]+) (?P<stream>stdout|stderr) (?P<flag>[^ ]+) (?P<log>.*)$'
        timestamp:
          parse_from: attributes.ts
          layout: '%Y-%m-%dT%H:%M:%S.%fZ'
processors:
  k8sattributes: {}
  batch: {}
exporters:
  otlphttp:
    endpoint: https://<CUBEAPM_ENDPOINT>/otlp
    headers: { "x-api-key": "<CUBEAPM_API_KEY>" }
service:
  pipelines:
    logs:
      receivers: [filelog, k8s_events]
      processors: [k8sattributes, batch]
      exporters: [otlphttp]
    metrics:
      receivers: [kubeletstats]
      processors: [k8sattributes, batch]
      exporters: [otlphttp]

k8s_events: Streams probe failures and restart events.
kubeletstats: Captures CPU throttling and memory pressure during startup.
filelog: Reads container logs for entrypoint errors or dependency timeouts.
k8sattributes: Adds pod and namespace labels for correlation.
otlphttp: Forwards enriched telemetry securely to CubeAPM.

3B. Deployment (gateway) — correlate and export, track Rollout context via k8sobjects

YAML

receivers:
  otlp: { protocols: { http: {}, grpc: {} } }
  k8sobjects:
    objects:
      - name: pods
      - name: deployments
      - name: replicasets
processors:
  resource:
    attributes:
      - key: rollout.resource
        action: upsert
        from_attribute: k8s.deployment.name
  batch: {}
exporters:
  otlphttp:
    endpoint: https://<CUBEAPM_ENDPOINT>/otlp
    headers: { "x-api-key": "<CUBEAPM_API_KEY>" }
service:
  pipelines:
    logs:
      receivers: [otlp, k8sobjects]
      processors: [resource, batch]
      exporters: [otlphttp]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp]
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp]

k8sobjects: Links probe failures to Deployments or ReplicaSets for rollout-level insights.
resource: Adds rollout metadata to every trace and log.
otlphttp: Sends unified telemetry to CubeAPM’s ingestion endpoint.

Step 4 — Supporting Components

Add kube-state-metrics to expose workload health and probe states for richer dashboards.

Bash

helm upgrade --install kube-state-metrics prometheus-community/kube-state-metrics --namespace monitoring

This adds pod condition metrics (kube_pod_status_ready, kube_pod_container_status_restarts_total) to correlate with startup probe failures.

Step 5 — Verification (What You Should See in CubeAPM)

After setup, verify data flow inside CubeAPM. You should see:

Events: Startup probe failure logs under Pod events.
Metrics: Spike in container restart counts and delayed readiness.
Logs: “exec not found” or connection timeout messages.
Restarts: Containers restarting repeatedly during rollout.
Rollout Context: Affected Deployments or ReplicaSets linked to failure events.

Example Alert Rules for Startup Probe Failed Error

1. Suspected Startup Probe Failures (restarts + CrashLoopBackOff)

Use restarts combined with the container waiting reason to flag pods likely failing their startup probe shortly after rollout.

YAML

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: startup-probe-suspected
  namespace: monitoring
spec:
  groups:
  - name: startup-probe.rules
    rules:
    - alert: StartupProbeFailedSuspected
      expr: |
        (increase(kube_pod_container_status_restarts_total[10m]) > 3)
        and on (pod,namespace,container)
        (kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"} == 1)
      for: 5m
      labels:
        severity: warning
        team: platform
      annotations:
        summary: "Startup probe likely failing for {{ $labels.namespace }}/{{ $labels.pod }} (container={{ $labels.container }})"
        description: "Container is restarting frequently and is in CrashLoopBackOff, which commonly indicates failing startup probes or invalid entrypoints."

2. Slow Initialization (pod not Ready for too long)

Alert when a pod stays unready well beyond expected warm-up time since creation — a classic symptom of too-strict startup probe thresholds.

YAML

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: startup-probe-slow-init
  namespace: monitoring
spec:
  groups:
  - name: startup-probe.rules
    rules:
    - alert: StartupProbeSlowInitialization
      expr: |
        (max by (pod,namespace) (time() - kube_pod_start_time) > 300)
        and on (pod,namespace)
        (kube_pod_status_ready{condition="true"} == 0)
      for: 5m
      labels:
        severity: warning
        team: platform
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} unready for >5m after start"
        description: "Pod remains unready long after creation; consider increasing startupProbe thresholds or verifying dependency availability."

3. Resource Pressure During Startup (CPU/memory)

Detect pods likely failing startup due to throttling or memory pressure while still unready.

YAML

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: startup-probe-resource-pressure
  namespace: monitoring
spec:
  groups:
  - name: startup-probe.rules
    rules:
    - alert: StartupProbeResourcePressure
      expr: |
        (
          rate(container_cpu_cfs_throttled_seconds_total{container!=""}[5m]) > 0.2
          or
          (
            container_memory_working_set_bytes{container!=""}
            /
            kube_pod_container_resource_limits_memory_bytes{container!=""}
          ) > 0.9
        )
        and on (pod,namespace)
        (kube_pod_status_ready{condition="true"} == 0)
      for: 10m
      labels:
        severity: warning
        team: platform
      annotations:
        summary: "Resource pressure during startup for {{ $labels.namespace }}/{{ $labels.pod }}"
        description: "High CPU throttling or memory near limit while pod is still unready; relax resources or tune JVM/runtime flags and startupProbe timing."

Conclusion

Startup probe failures usually boil down to three patterns: slow application initialization, misaligned probe settings, or unavailable dependencies. Left unchecked, they trigger restart loops, delay rollouts, and consume cluster resources.

You can stabilize rollouts by widening the startup time budget, correcting endpoints/ports, gating on dependencies with init containers, and easing resource pressure during cold starts. Treat probe tuning as part of your deployment contract.

CubeAPM closes the loop by correlating Events, Metrics, Logs, and Rollouts so you see exactly which change caused the failure and why. With real-time dashboards and targeted alerts, teams resolve startup issues before they become user-visible incidents.

FAQs

1) What’s the difference between startup, liveness, and readiness probes?

Startup checks one-time initialization; liveness ensures the process isn’t deadlocked; readiness decides if traffic can be routed. CubeAPM visualizes all three signals alongside restarts and rollout context to show when and why each probe flips.

2) How do I choose good startup probe thresholds?

Estimate worst-case boot time (framework load, migrations, cache warm-up), then set failure attempts × period to exceed that margin. Start conservative, then tighten with real data. CubeAPM trend charts reveal actual boot distributions so you right-size thresholds confidently.

3) Does CrashLoopBackOff always mean a startup probe failed?

No. It can stem from invalid entrypoints, OOMs, or missing dependencies. Correlate container logs with pod Events and resource metrics. CubeAPM links these signals automatically so you can distinguish probe misconfig from runtime faults.

4) Can I disable the startup probe to debug?

Temporarily, yes—useful to stop rapid restarts and capture full logs. Restore the probe afterward and tune thresholds based on what you learn. CubeAPM timelines help compare behavior before/after changes and confirm the fix.

5) How does CubeAPM help with recurring startup probe failures?

It ties failure Events to the exact Deployment/ReplicaSet, highlights resource bottlenecks, surfaces dependency timeouts in logs, and triggers focused alerts (restarts, prolonged unready, throttling). This end-to-end view shortens MTTR and prevents regressions across rollouts.

Last9 vs Datadog: In-Depth Comparison 2026

Indu Priya July 3, 2026

Monitoring a Fastify Application: Datadog Setup, Overhead, and Alternatives

Indu Priya July 3, 2026

Vertex AI Endpoint Latency and Cost Monitoring: Complete Guide

Abhinav Garg July 3, 2026

Monitoring DragonflyDB in Production: Setup & Best Practices

Indu Priya July 3, 2026

pgvector Query Performance Monitoring: How to Track Index Health, Query Latency, and Embedding Search Performance

Abhinav Garg July 3, 2026

SigNoz vs Azure Monitor: In-Depth Comparison 2026

Indu Priya July 3, 2026

Kubernetes Startup Probe Failed Error Explained: Pod Initialization, Probe Timeouts & Restart Loops

Table of Contents

What is Kubernetes Startup Probe Failed Error

Why Kubernetes Startup Probe Failed Error Happens

1. Slow Application Initialization

2. Incorrect Probe Endpoint or Port

3. Dependencies Not Yet Available

4. Resource Starvation During Boot

5. Failing Entrypoint or Command

How to Fix Kubernetes Startup Probe Failed Error

1. Fix invalid entrypoint or command

2. Increase the startup probe’s time budget

3. Correct probe path/port

4. Gate on dependencies (DB, cache, broker)

5. Relax resource pressure during boot

6. Add a dedicated startup endpoint/handler

7. Temporarily disable the startup probe to debug

Monitoring Kubernetes Startup Probe Failed Error with CubeAPM

Step 1 — Install CubeAPM (Helm)

Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)

Step 3 — Collector Configurations Focused on Startup Probe Failures

3A. DaemonSet (node/pod collectors) — capture Events, Kubelet metrics, and container Logs

3B. Deployment (gateway) — correlate and export, track Rollout context via k8sobjects

Step 4 — Supporting Components

Step 5 — Verification (What You Should See in CubeAPM)

Example Alert Rules for Startup Probe Failed Error

1. Suspected Startup Probe Failures (restarts + CrashLoopBackOff)

2. Slow Initialization (pod not Ready for too long)

3. Resource Pressure During Startup (CPU/memory)

Conclusion

FAQs

Related Posts

Features

Resources

Links