Kubernetes 502 Bad Gateway Error Explained: Upstream Failures, Pod Crashes, and Network Timeouts

Author: Vijay Aggarwal
Category: Kubernetes Errors
Published Date: October 6, 2025
Last updated: January 20th, 2026

Kubernetes 502 bad gateway error occurs when a request passes through an ingress or gateway (like NGINX or Envoy) but the upstream Pod returns an invalid or empty response. Even short-lived 502s can ripple through microservices—breaking APIs, frontends, and downstream calls—and with major outages costing companies a median of $2M per hour, minimizing these errors is critical.

CubeAPM enables teams to trace Kubernetes 502 Bad Gateway errors end-to-end by correlating ingress spikes with Pod restarts, container logs, and rollout events. This makes it clear whether the issue stems from failing upstream Pods, readiness probe misfires, or misrouted Services—cutting resolution time from hours to minutes.

In this guide, we’ll explain what the Kubernetes 502 Bad Gateway error is, why it happens, how to fix it, and—critically—how CubeAPM can help you detect, correlate, and resolve it faster.

What is Kubernetes ‘502 Bad Gateway’ Error

Kubernetes 502 Bad Gateway error means that an ingress or gateway component (such as NGINX Ingress Controller, Envoy, or HAProxy) successfully forwarded a client request, but the upstream Pod or Service failed to respond with a valid payload. Instead, the gateway receives an invalid, empty, or incomplete response, which is then surfaced to the client as a 502 error.

This issue typically arises in distributed microservice setups where traffic must pass through multiple layers—Ingress, Services, and Pods—before reaching an application container. Any failure along this chain (such as a Pod crash, readiness probe failure, or misrouted Service) can cause the gateway to return a 502.

Key characteristics of Kubernetes 502 Bad Gateway error:

Intermittent or sudden: May appear only under load spikes or during Pod restarts
Ingress-level visibility: Error is surfaced by NGINX, Envoy, or HAProxy, not the Pod itself
Tied to upstream health: Often linked to failing readiness probes, crashes, or timeouts
Cascading effect: Can propagate across dependent services if not resolved quickly
Difficult to isolate: Root cause is often buried in Pod logs or rollout history

Why Kubernetes ‘502 Bad Gateway’ Error Happens

1. Readiness probe failures

A readiness probe failure often leads to the Kubernetes 502 Bad Gateway error. When readiness probes are too strict or misconfigured, Kubernetes may mark a Pod as “Ready” before the application is actually prepared to handle traffic. Requests routed at this stage often fail because the container process is still warming up or dependencies aren’t available. Over time, this leads to repeated 502 responses that appear randomly, especially after rollouts or scaling events.

Quick check:

Bash

kubectl describe pod <pod-name>

What to look for: Events showing repeated Readiness probe failed messages, indicating the Pod was marked as ready too soon.

2. Pod crashes or restarts

Containers that crash mid-request leave the ingress controller waiting on a response that never arrives.. During this downtime, Kubernetes attempts to restart the Pod, but until it’s healthy again, all traffic routed there results in 502 errors. This pattern is common with memory leaks, OOMKilled events, or application-level exceptions.

Quick check:

Bash

kubectl get pods

What to look for: Pods with a high RESTARTS count, which signals instability and repeated container crashes.

3. Service misconfiguration

If a Service selector doesn’t align with Pod labels, the Service ends up with no endpoints. This misconfiguration is surprisingly easy to miss during deployments, especially when labels or selectors change in manifests. The ingress continues forwarding traffic, but with no upstream Pod, every request returns a 502 error until the mismatch is fixed.

Quick check:

Bash

kubectl describe svc <service-name>

What to look for: Endpoints: <none> or empty endpoint lists, showing the Service isn’t routing to any Pods.

4. Network policies blocking traffic

Strict or misapplied NetworkPolicy rules can unintentionally block ingress-to-Pod communication. From the cluster’s perspective, Pods may look perfectly healthy, but the gateway cannot reach them. This disconnect causes clients to see 502 responses even when application logs show no errors, making it a tricky issue to diagnose without looking at policies.

Quick check:

Bash

kubectl describe networkpolicy

What to look for: Rules that omit or block ingress traffic from the ingress namespace to backend Pods.

5. Backend timeouts under load

When Pods are overloaded—due to insufficient resources, exhausted connection pools, or slow query handling—they may fail to respond within expected timeouts. The ingress controller, waiting for an upstream reply, eventually terminates the request and surfaces it as a 502. This problem usually spikes during high traffic periods when scaling can’t keep up.

Quick check:

Bash

kubectl logs -n ingress-nginx <controller-pod>

What to look for: upstream timeout messages that appear during traffic peaks, confirming backend slowness.

6. TLS/SSL handshake issues

If the ingress and backend Pods use mismatched TLS versions, unsupported ciphers, or expired certificates, handshakes between them fail. These failed attempts never establish a proper connection, so the ingress surfaces them as 502 errors. This issue often happens in environments where custom certificates are rotated manually or services enforce stricter TLS settings than the ingress.

Quick check:

Bash

openssl s_client -connect <pod-ip>:<port>

What to look for: Handshake failure logs in the ingress or certificate expiry/mismatch errors from the openssl command.

7. Rolling updates without surge capacity

During rolling updates, Kubernetes may shut down too many old Pods before the new ones are ready if the deployment strategy isn’t tuned. For a short period, there are no valid endpoints available, and the ingress returns 502 responses. While this usually resolves once new Pods come online, it creates noticeable outages if surge and availability settings are misconfigured.

Quick check:

Bash

kubectl rollout status deployment <name>

What to look for: Gaps in availability where no Pods are running during rollout, especially with low maxSurge or high maxUnavailable values.

How to Fix Kubernetes ‘502 Bad Gateway’ Error

1. Fix readiness probe misconfigurations

To resolve a Kubernetes 502 Bad Gateway error readiness must reflect when the app can actually serve traffic; premature “Ready” states route requests too early and trigger 502s.

Quick check:

Bash

kubectl describe pod <pod-name>

Fix:

Bash

kubectl patch deployment <deploy-name> -n <ns> --type='json' -p='[{"op":"add","path":"/spec/template/spec/containers/0/readinessProbe","value":{"httpGet":{"path":"/healthz","port":8080},"initialDelaySeconds":15,"periodSeconds":5,"timeoutSeconds":2,"failureThreshold":6,"successThreshold":1}}]'

2. Stabilize crashing Pods

Frequent Pod crashes are one of the main causes of Kubernetes ingress 502 errors. When a container restarts mid-request, the ingress layer receives no valid upstream response, which surfaces as a kubernetes 502 Bad Gateway error.

Quick check:

Bash

kubectl get pods -n <ns>

Fix:

Bash

kubectl set resources deployment <deploy-name> -n <ns> --limits=cpu=1000m,memory=1Gi --requests=cpu=300m,memory=512Mi

3. Correct Service selectors

Service misconfigurations are another common root cause of Kubernetes.502 Bad Gateway error when Service selectors don’t match the labels on running Pods, the Service has no valid endpoints, and the ingress forwards traffic into a void.

Quick check:

Bash

kubectl describe svc <service-name> -n <ns>

Fix:

Bash

kubectl patch svc <service-name> -n <ns> --type='merge' -p='{"spec":{"selector":{"app":"<pod-label-app>","tier":"<pod-label-tier>"}}}'

4. Adjust NetworkPolicy rules

Over-restrictive policies can block ingress-to-Pod traffic, making healthy Pods unreachable.

Quick check:

Bash

kubectl describe networkpolicy -n <ns>

Fix:

Bash

kubectl patch networkpolicy <np-name> -n <ns> --type='merge' -p='{"spec":{"ingress":[{"from":[{"namespaceSelector":{"matchLabels":{"kubernetes.io/metadata.name":"ingress-nginx"}}}],"ports":[{"protocol":"TCP","port":80},{"protocol":"TCP","port":443}]}]}}'

5. Resolve backend timeouts

When backends are slow or overloaded, upstream timeouts at the ingress manifest as 502s.

Quick check:

Bash

kubectl logs -n ingress-nginx deploy/ingress-nginx-controller | grep -i "upstream timeout"

Fix:

Bash

kubectl annotate ingress <ingress-name> -n <ns> nginx.ingress.kubernetes.io/proxy-read-timeout="60" nginx.ingress.kubernetes.io/proxy-send-timeout="60" --overwrite

6. Fix TLS/SSL handshake problems

Protocol or certificate mismatches break the ingress↔backend handshake and bubble up as 502s.

Quick check:

Bash

openssl s_client -connect <pod-ip>:<port> -servername <svc-hostname>

Fix:

Bash

kubectl annotate ingress <ingress-name> -n <ns> nginx.ingress.kubernetes.io/ssl-protocols="TLSv1.2 TLSv1.3" nginx.ingress.kubernetes.io/ssl-prefer-server-ciphers="true" --overwrite

7. Tune rolling update strategy

If too many old Pods terminate before new Pods are ready, the ingress has no upstreams and clients see 502s mid-rollout.

Quick check:

Bash

kubectl rollout status deployment <deploy-name> -n <ns>

Fix:

Bash

kubectl patch deployment <deploy-name> -n <ns> --type='merge' -p='{"spec":{"strategy":{"type":"RollingUpdate","rollingUpdate":{"maxUnavailable":0,"maxSurge":1}}}}'

Monitoring Kubernetes ‘502 Bad Gateway’ Error with CubeAPM

Fastest path to root cause: CubeAPM enables you to correlate ingress-layer 502 spikes with Pod logs, container restarts, Kubernetes events, and rollout history in one view. For 502s, the four signals you’ll lean on are: Events (readiness flaps, rollout gaps), Metrics (ingress and pod health), Logs (ingress controller + app errors), and Rollouts (deployment strategy & surge gaps).

Step 1 — Install CubeAPM (Helm)

Bash

helm repo add cubeapm https://charts.cubeapm.com && helm repo update cubeapm && helm show values cubeapm/cubeapm > values.yaml && helm install cubeapm cubeapm/cubeapm -f values.yaml

If already installed:

Bash

helm upgrade cubeapm cubeapm/cubeapm -f values.yaml

Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)

Run both modes:

DaemonSet → per-node, captures pod logs and kubelet/host metrics
Deployment → cluster-level events and metrics

Bash

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts && helm repo update open-telemetry

Bash

helm install otel-collector-daemonset open-telemetry/opentelemetry-collector -f otel-collector-daemonset.yaml && helm install otel-collector-deployment open-telemetry/opentelemetry-collector -f otel-collector-deployment.yaml

Step 3 — Collector Configs Focused on 502 Bad Gateway

DaemonSet (logs + kubelet + host metrics):

YAML

# otel-collector-daemonset.yaml

mode: daemonset

image:

  repository: otel/opentelemetry-collector-contrib

presets:

  kubernetesAttributes: { enabled: true }

  hostMetrics: { enabled: true }

  kubeletMetrics: { enabled: true }

  logsCollection: { enabled: true, storeCheckpoints: true }

config:

  exporters:

    otlphttp/metrics:

      metrics_endpoint: http://<cubeapm_endpoint>:3130/api/metrics/v1/save/otlp

      retry_on_failure: { enabled: false }

    otlphttp/logs:

      logs_endpoint: http://<cubeapm_endpoint>:3130/api/logs/insert/opentelemetry/v1/logs

      headers: { Cube-Stream-Fields: k8s.namespace.name,k8s.deployment.name }

    otlp/traces:

      endpoint: <cubeapm_endpoint>:4317

      tls: { insecure: true }

  processors:

    batch: {}

    resourcedetection: { detectors: ["system"] }

  receivers:

    otlp: { protocols: { grpc: {}, http: {} } }

    kubeletstats:

      collection_interval: 60s

      metric_groups: [container, node, pod]

    hostmetrics:

      collection_interval: 60s

      scrapers: { cpu: {}, memory: {}, network: {} }

service:

  pipelines:

    metrics: { receivers: [hostmetrics, kubeletstats], processors: [batch], exporters: [otlphttp/metrics] }

    logs:    { receivers: [otlp], processors: [batch], exporters: [otlphttp/logs] }

    traces:  { receivers: [otlp], processors: [batch], exporters: [otlp/traces] }

logsCollection → captures ingress/app logs with 502 lines
kubeletstats → surface Pod restarts tied to 502s
hostmetrics → CPU/mem/network saturation that correlates to upstream timeouts

Deployment (cluster events + metrics):

YAML

# otel-collector-deployment.yaml

mode: deployment

image:

  repository: otel/opentelemetry-collector-contrib

presets:

  kubernetesEvents: { enabled: true }

  clusterMetrics: { enabled: true }

config:

  exporters:

    otlphttp/metrics:

      metrics_endpoint: http://<cubeapm_endpoint>:3130/api/metrics/v1/save/otlp

    otlphttp/k8s-events:

      logs_endpoint: http://<cubeapm_endpoint>:3130/api/logs/insert/opentelemetry/v1/logs

      headers: { Cube-Stream-Fields: event.domain }

  processors:

    batch: {}

  receivers:

    k8s_cluster:

      collection_interval: 60s

      metrics:

        k8s.node.condition: { enabled: true }

service:

  pipelines:

    metrics: { receivers: [k8s_cluster], processors: [batch], exporters: [otlphttp/metrics] }

    logs:    { receivers: [k8sobjects], processors: [batch], exporters: [otlphttp/k8s-events] }

kubernetesEvents → readiness flaps, rollout gaps
clusterMetrics → node health, scheduling context
event logs → timeline of probe failures alongside 502 spikes

Step 4 — Supporting Components

If you want ingress metrics directly, add Prometheus scraping:

YAML

receivers:

  prometheus:

    config:

      scrape_configs:

        - job_name: "ingress"

          static_configs:

            - targets: ["ingress-nginx-controller.ingress-nginx.svc.cluster.local:10254"]

Step 5 — Verification in CubeAPM

Events: Readiness probe failed, rollout start/stop, Pod crash events
Metrics: CPU/memory pressure near 502 spikes, ingress request error rates
Logs: Ingress logs with 502 or upstream timeout entries, app logs for backend errors
Restarts: Containers in crash loop or restarting during load
Rollouts: Deployment gaps where Pods dropped before new ones came online

Example Alert Rules for Kubernetes ‘502 Bad Gateway’ Error

1. Spike in ingress 502 responses

A sudden increase in 502 responses from the ingress controller usually means backend Pods are failing, unreachable, or timing out. This is often the first external signal customers see when services are unhealthy. Catching this spike early allows teams to investigate Pod readiness, network policies, or resource bottlenecks before downtime escalates.

YAML

- alert: High502ErrorRate

  expr: sum(rate(nginx_ingress_controller_requests{status="502"}[5m])) by (namespace) > 5

  for: 2m

  labels: { severity: critical }

  annotations:

    summary: "High 502 error rate detected in namespace"

2. No healthy endpoints in a Service

If a Service has no available endpoints, traffic routed through it will always fail, resulting in 502s at the ingress layer. This usually happens due to label mismatches or Pods failing readiness checks. Monitoring for Services without endpoints helps catch silent configuration issues before they cause widespread client errors.

YAML

- alert: ServiceNoEndpoints

  expr: kube_endpoint_address_available == 0

  for: 1m

  labels: { severity: warning }

  annotations:

    summary: "Service has no endpoints available"

3. Backend Pod restarts during traffic

When Pods restart frequently, they often drop connections mid-request, producing incomplete or failed responses. The ingress controller surfaces these as 502 errors because the upstream suddenly goes offline. Tracking restart frequency ensures unstable Pods are flagged before they impact user-facing traffic.

YAML

- alert: PodCrashLooping

  expr: rate(kube_pod_container_status_restarts_total[5m]) > 3

  for: 5m

  labels: { severity: warning }

  annotations:

    summary: "Pod restarting frequently, may cause 502s"

4. Pods not becoming Ready (readiness probe trouble)

If Pods stay NotReady after rollout or scale-up, Services will route to backends that can’t serve traffic, surfacing 502s at the ingress layer. This often points to incorrect probe paths/thresholds, slow startups, or missing dependencies. Alerting on stuck NotReady Pods gives you time to fix probes or add warm-up delays before customers feel it.

YAML

- alert: PodsStuckNotReady

  expr: sum(kube_pod_status_ready{condition="true"} == 0) by (namespace) > 0

  for: 5m

  labels: { severity: warning }

  annotations:

    summary: "One or more Pods remain NotReady, downstream 502s likely"

5. Rollout gaps (no surge capacity)

During a deployment, if too many old Pods terminate before new ones are Ready, the ingress temporarily has no upstream endpoints, causing waves of 502 responses. This alert catches rollout windows where availability drops below the intended capacity, usually due to an aggressive maxUnavailable or too-low maxSurge. Tune the strategy before pushing large releases.

YAML

- alert: DeploymentReplicasUnavailable

  expr: sum(kube_deployment_status_replicas_unavailable) by (namespace, deployment) > 0

  for: 3m

  labels: { severity: critical }

  annotations:

    summary: "Deployment has unavailable replicas during rollout; risk of 502s"

Conclusion

The Kubernetes 502 Bad Gateway error is more than just a failed HTTP response—it’s a sign that something deeper in your cluster isn’t aligned. Whether it’s misconfigured readiness probes, Pod crashes, missing Service endpoints, or rollout gaps, the result is the same: traffic drops, APIs fail, and users experience broken workflows.

The fixes require a systematic approach: validating probes, stabilizing Pods, checking Services, and tightening NetworkPolicies. With alert rules in place, you can catch 502 patterns before they turn into widespread outages.

CubeAPM makes this process faster by tying together events, metrics, logs, and rollout history into one unified view. Instead of chasing 502s through multiple tools, teams get clear visibility into root cause and resolution paths—reducing downtime and keeping Kubernetes workloads resilient at scale.

FAQs

1. What does a 502 Bad Gateway error mean in Kubernetes?

It means the ingress controller (like NGINX or Envoy) accepted the client request but couldn’t get a valid response from the upstream Pod. This often points to Pod readiness, crashes, or Service misconfigurations.

2. How is a 502 error different from a 504 Gateway Timeout in Kubernetes?

A 502 indicates the upstream failed to respond properly (invalid, empty, or broken response), while a 504 means the upstream didn’t respond in time. Both surface at the ingress layer but have different root causes.

3. Can readiness probe issues really cause 502 errors?

Yes. If Pods are marked “Ready” too early, Services forward traffic before the container is ready. Ingress forwards these failed connections as 502 errors to the client.

4. How can I proactively monitor for Kubernetes 502 errors?

The most effective way is to track ingress metrics, Pod restart counts, and Service endpoints. Tools like CubeAPM simplify this by correlating these signals, showing whether the 502s are due to crashes, timeouts, or rollout gaps.

5. What’s the fastest way to troubleshoot a 502 in production?

Start at the ingress logs to confirm 502 spikes, check the backing Service for healthy endpoints, and then look at Pod logs for crashes or probe failures. Platforms like CubeAPM reduce this process by automatically linking ingress error rates with Pod events and rollout history.

Last9 vs Datadog: In-Depth Comparison 2026

Indu Priya July 3, 2026

Monitoring a Fastify Application: Datadog Setup, Overhead, and Alternatives

Indu Priya July 3, 2026

Vertex AI Endpoint Latency and Cost Monitoring: Complete Guide

Abhinav Garg July 3, 2026

Monitoring DragonflyDB in Production: Setup & Best Practices

Indu Priya July 3, 2026

pgvector Query Performance Monitoring: How to Track Index Health, Query Latency, and Embedding Search Performance

Abhinav Garg July 3, 2026

SigNoz vs Azure Monitor: In-Depth Comparison 2026

Indu Priya July 3, 2026

Kubernetes 502 Bad Gateway Error Explained: Upstream Failures, Pod Crashes, and Network Timeouts

Table of Contents

What is Kubernetes ‘502 Bad Gateway’ Error

Why Kubernetes ‘502 Bad Gateway’ Error Happens

1. Readiness probe failures

2. Pod crashes or restarts

3. Service misconfiguration

4. Network policies blocking traffic

5. Backend timeouts under load

6. TLS/SSL handshake issues

7. Rolling updates without surge capacity

How to Fix Kubernetes ‘502 Bad Gateway’ Error

1. Fix readiness probe misconfigurations

2. Stabilize crashing Pods

3. Correct Service selectors

4. Adjust NetworkPolicy rules

5. Resolve backend timeouts

6. Fix TLS/SSL handshake problems

7. Tune rolling update strategy

Monitoring Kubernetes ‘502 Bad Gateway’ Error with CubeAPM

Step 1 — Install CubeAPM (Helm)

Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)

Step 3 — Collector Configs Focused on 502 Bad Gateway

Step 4 — Supporting Components

Step 5 — Verification in CubeAPM

Example Alert Rules for Kubernetes ‘502 Bad Gateway’ Error

1. Spike in ingress 502 responses

2. No healthy endpoints in a Service

3. Backend Pod restarts during traffic

4. Pods not becoming Ready (readiness probe trouble)

5. Rollout gaps (no surge capacity)

Conclusion

FAQs

Related Posts

Features

Resources

Links