Kubernetes ImagePullBackOff error tells you Kubernetes could not pull a container image for a Pod, so the container stays in Waiting and the scheduler backs off before trying again. It is noisy, disrupts rollouts, and usually traces back to credentials, names, tags, policies, or registry limits. Industry surveys show that 90% of organizations put downtime costs above $300,000 per hour — meaning even a few Pods stuck in this state can cause real business impact
CubeAPM helps you detect these failures the moment they happen. By ingesting Kubernetes Events, Prometheus metrics, and container runtime logs, it surfaces ErrImagePull and ImagePullBackOff signals across clusters in real time. Teams can correlate failed pulls with deployments, registry errors, and rollout history without guesswork.
With CubeAPM dashboards and alert rules, you can spot spikes in image pull failures, drill down into the exact namespace or pod, and confirm whether the cause is a bad tag, missing secret, or registry rate limit. That reduces mean time to recovery and ensures smoother rollouts.
In this article, we’ll break down what ImagePullBackOff means, why it happens, how to fix it, and how CubeAPM can help you monitor and prevent these errors at scale.
Table of Contents
ToggleWhat is ImagePullBackOff in Kubernetes?

ImagePullBackOff is a status message that Kubernetes assigns to a Pod when it repeatedly fails to pull the required container image from a registry. It usually follows an ErrImagePull, which is the initial error state that happens when a pull attempt fails.
When Kubernetes hits this error, the kubelet (the node agent) does not keep retrying endlessly. Instead, it switches to an exponential backoff strategy — trying again after short delays, then longer ones, until either the image becomes available or the Pod is deleted. This prevents the cluster from overloading the registry or spamming network calls.
You will typically see ImagePullBackOff listed under the STATUS column when running kubectl get pods. To get more context, the command kubectl describe pod <pod-name> reveals Events such as:
- “Failed to pull image… repository does not exist”
- “Error response from registry: authentication required”
- “Too Many Requests” (indicating rate limits)
In short, ImagePullBackOff isn’t the root cause itself — it’s Kubernetes signaling that the image pull failed and it’s backing off from retrying too aggressively.
Why ImagePullBackOff in Kubernetes Happens
Kubernetes can’t pull an image for several reasons. Some are simple typos, others come from deeper registry or infrastructure issues. Here are the main causes in detail:
1. Wrong image name or tag
A typo in the image string (registry, repository, or tag) is the most common cause.
- Example: nginx:latestt instead of nginx:latest.
- Registries reject unknown tags, and Kubernetes marks the Pod with ErrImagePull.
- This quickly escalates into ImagePullBackOff when retries fail.
Quick check:
kubectl describe pod <pod-name>
If the Event says “manifest for <image> not found”, it’s likely a bad tag.
2. Missing or invalid credentials for private registries
Private registries like Amazon ECR, Google Artifact Registry, or Harbor require authentication.
- If imagePullSecrets are not configured, Kubernetes cannot fetch the image.
- Even expired tokens can cause this.
- The error usually shows as “unauthorized: authentication required”.
Pro tip: Ensure the Secret is in the same namespace as the Pod and is linked to the ServiceAccount, not just created.
3. Registry rate limits or throttling
Public registries (like Docker Hub) throttle excessive pulls from unauthenticated IPs.
- This results in HTTP 429 Too Many Requests.
- Large clusters with multiple nodes can hit these limits quickly during rollouts.
- The Pod keeps retrying, but exponential backoff increases delay.
Best practice: always authenticate pulls, or mirror base images into a private registry.
4. ImagePullPolicy misconfiguration
Kubernetes decides when to pull images based on this policy:
- Always → forces a registry check every Pod start.
- IfNotPresent → uses cached image if available.
- Never → skips pulls completely.
Misuse can lead to surprise failures. For example, using Always with :latest means every restart depends on the registry being available, which increases chances of ImagePullBackOff during outages.
5. Networking or DNS issues
If worker nodes cannot reach the registry, pulls fail.
- Firewalls, corporate proxies, or misconfigured network policies often block traffic.
- DNS issues can prevent resolving registry domains like index.docker.io.
- The Pod Events may show “dial tcp: lookup registry on 10.x.x.x:53: no such host”.
Quick test from a node:
curl -I https://index.docker.io/v1/
If this times out, the problem is network or DNS, not Kubernetes.
6. Architecture or OS mismatch
Sometimes the image is built only for amd64 but the nodes are arm64 (or vice versa).
- This mismatch results in errors like “no matching manifest for linux/arm64 in the manifest list entries”.
- Multi-arch images (via Docker Buildx) solve this by bundling multiple architectures.
7. Policy controllers or admission hooks
Cluster policies may block pulls under certain conditions:
- Security policies requiring only signed images.
- Admission controllers rejecting Pods that don’t specify digests.
- Namespace restrictions preventing access to Secrets.
In these cases, the pull error is not about the registry itself, but about compliance checks applied at deploy time.
How to Fix Kubernetes ImagePullBackOff Error
Fixing this issue requires validating each possible failure point. Here’s a step-by-step approach with code snippets:
1. Check the image name and tag
Confirm the image exists and is spelled correctly:
docker pull nginx:1.27
If this works locally but fails in the cluster, the problem is likely authentication, policy, or networking.
2. Verify access to a private registry
Create a Secret for registry credentials:
kubectl create secret docker-registry regcred \
--docker-server=myregistry.example.com \
--docker-username=myuser \
--docker-password=mypassword \
--docker-email=myemail@example.com
Reference it in your Pod spec:
apiVersion: v1
kind: Pod
metadata:
name: private-pod
spec:
containers:
- name: app
image: myregistry.example.com/myapp:1.0
imagePullSecrets:
- name: regcred
3. Inspect Pod Events
Use kubectl describe to see why the pull is failing:
kubectl describe pod <pod-name>
Look under Events for clues like authentication required, repository not found, or Too Many Requests.
4. Fix imagePullPolicy issues
Example of caching images when tags are immutable:
apiVersion: v1
kind: Pod
metadata:
name: cache-friendly
spec:
containers:
- name: app
image: myapp:1.0
imagePullPolicy: IfNotPresent
For repeatability, pin images by digest:
image: myapp@sha256:3e1f46b54bb...
5. Confirm networking and DNS
From a cluster node, test connectivity to the registry:
curl -v https://index.docker.io/v1/
If this fails, fix firewall, proxy, or DNS settings before retrying the Pod.
6. Address registry rate limits
Authenticate pulls to avoid limits:
docker login
kubectl create secret docker-registry dockersecret \
--docker-server=https://index.docker.io/v1/ \
--docker-username=<username> \
--docker-password=<password>
Then attach the secret as shown earlier.
7. Ensure architecture compatibility
Check the image’s supported platforms:
docker manifest inspect nginx:1.27 | grep architecture
If your nodes run arm64 but the image only has amd64, switch to a multi-arch build.
8. Retry the Pod after fixes
Delete the broken Pod so the controller retries with your updates:
kubectl delete pod <pod-name>
kubectl get pods -w
Monitoring ImagePullBackOff in Kubernetes with CubeAPM
CubeAPM ingests Kubernetes Events, Prometheus/KSM metrics, and node/pod runtime logs via the OpenTelemetry Collector. The recommended setup is to run two Collector instances: a DaemonSet (node/pod metrics + logs) and a Deployment (cluster metrics + Kubernetes Events).
1. Install the OpenTelemetry Collector (Helm)
Add the repo and apply two values files (one for each mode):
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update open-telemetry
# after creating the two YAMLs below:
helm install otel-collector-daemonset open-telemetry/opentelemetry-collector -f otel-collector-daemonset.yaml
helm install otel-collector-deployment open-telemetry/opentelemetry-collector -f otel-collector-deployment.yaml
2. DaemonSet config (host + kubelet metrics, logs → CubeAPM)
This streams host metrics, kubelet/pod metrics, and logs to CubeAPM:
# otel-collector-daemonset.yaml
mode: daemonset
image:
repository: otel/opentelemetry-collector-contrib
presets:
kubernetesAttributes: { enabled: true }
hostMetrics: { enabled: true }
kubeletMetrics: { enabled: true }
logsCollection:
enabled: true
storeCheckpoints: true
config:
exporters:
otlphttp/metrics:
metrics_endpoint: http://<cubeapm_endpoint>:3130/api/metrics/v1/save/otlp
retry_on_failure: { enabled: false }
otlphttp/logs:
logs_endpoint: http://<cubeapm_endpoint>:3130/api/logs/insert/opentelemetry/v1/logs
headers:
Cube-Stream-Fields: k8s.namespace.name,k8s.deployment.name,k8s.statefulset.name
otlp/traces:
endpoint: <cubeapm_endpoint>:4317
tls: { insecure: true }
processors:
batch: {}
receivers:
otlp:
protocols: { grpc: {}, http: {} }
kubeletstats:
collection_interval: 60s
insecure_skip_verify: true
metric_groups: [container, node, pod, volume]
hostmetrics:
collection_interval: 60s
scrapers: { cpu: {}, disk: {}, filesystem: {}, memory: {}, network: {} }
service:
pipelines:
metrics: { receivers: [hostmetrics, kubeletstats], processors: [batch], exporters: [otlphttp/metrics] }
logs: { receivers: [otlp], processors: [batch], exporters: [otlphttp/logs] }
traces: { receivers: [otlp], processors: [batch], exporters: [otlp/traces] }
3. Deployment config (cluster metrics + Kubernetes Events → CubeAPM)
Enables kubernetesEvents and streams Events like ErrImagePull and ImagePullBackOff as logs to CubeAPM:
# otel-collector-deployment.yaml
mode: deployment
image:
repository: otel/opentelemetry-collector-contrib
presets:
kubernetesEvents: { enabled: true }
clusterMetrics: { enabled: true }
config:
exporters:
otlphttp/metrics:
metrics_endpoint: http://<cubeapm_endpoint>:3130/api/metrics/v1/save/otlp
retry_on_failure: { enabled: false }
otlphttp/k8s-events:
logs_endpoint: http://<cubeapm_endpoint>:3130/api/logs/insert/opentelemetry/v1/logs
headers:
Cube-Stream-Fields: event.domain
receivers:
k8s_cluster:
collection_interval: 60s
service:
pipelines:
metrics: { receivers: [k8s_cluster], exporters: [otlphttp/metrics] }
logs: { receivers: [k8sobjects], exporters: [otlphttp/k8s-events] }
4. Add kube-state-metrics scrape (for alert rules)
The kube_pod_container_status_waiting_reason metric that powers your alert rules comes from kube-state-metrics (KSM). Use the Collector’s Prometheus receiver to scrape KSM and forward to CubeAPM.
receivers:
prometheus:
config:
scrape_configs:
- job_name: kube-state-metrics
scrape_interval: 30s
static_configs:
- targets:
- kube-state-metrics.kube-system.svc.cluster.local:8080
service:
pipelines:
metrics:
receivers:
- prometheus
processors:
- batch
exporters:
- otlphttp/metrics
How this helps with ImagePullBackOff
- Events: The ErrImagePull → ImagePullBackOff flow is captured as logs, searchable in CubeAPM with namespace, pod, and container context.
- Metrics: KSM exposes the Waiting reason metrics for alerting and dashboards (e.g., spikes by namespace).
- Logs: Node and container-runtime logs (401/403/429, DNS errors) are centralized to confirm the root cause quickly.
Example Alert Rules
Proactive alerting is the best way to avoid discovering ImagePullBackOff errors only after users are affected. Since Kubernetes surfaces these issues through both Events and kube-state-metrics, you can create Prometheus alerting rules that fire when Pods enter ErrImagePull or ImagePullBackOff states.
1. Pod is stuck with ImagePullBackOff
This rule triggers when any container is stuck in the ImagePullBackOff state for more than 3 minutes, signaling that Kubernetes cannot pull the image and has started backing off retries.
- alert: PodImagePullBackOff
expr: max by (namespace, pod, container) (
kube_pod_container_status_waiting_reason{reason="ImagePullBackOff"} > 0
)
for: 3m
labels:
severity: critical
annotations:
summary: "ImagePullBackOff for {{ $labels.container }} in {{ $labels.namespace }}/{{ $labels.pod }}"
description: "Kubernetes cannot pull the image for {{ $labels.container }}. Check image name, tag, imagePullSecrets, and rate limits."
2. Pod hit ErrImagePull
This alert catches the initial ErrImagePull condition before Kubernetes enters backoff, helping teams act quickly on misconfigurations or registry failures.
- alert: PodErrImagePull
expr: max by (namespace, pod, container) (
kube_pod_container_status_waiting_reason{reason="ErrImagePull"} > 0
)
for: 1m
labels:
severity: warning
annotations:
summary: "ErrImagePull for {{ $labels.container }} in {{ $labels.namespace }}/{{ $labels.pod }}"
description: "Image pull failed. Inspect Pod Events for registry errors and credentials."
3. Many pods failing pulls in the same namespace
This rule monitors bursts of failures. If more than five Pods in the same namespace hit pull errors, it likely points to a registry outage, DNS issue, or hitting rate limits.
- alert: NamespaceImagePullFailuresBurst
expr: sum by (namespace) (
kube_pod_container_status_waiting_reason{reason=~"ErrImagePull|ImagePullBackOff"}
) > 5
for: 5m
labels:
severity: critical
annotations:
summary: "Burst of image pull failures in {{ $labels.namespace }}"
description: "Multiple pods cannot pull images. Possible registry outage or rate limit."
These rules rely on kube-state-metrics which exports container Waiting reasons as metrics.
Conclusion
ImagePullBackOff is frustrating, but it is usually fixable once you check the Pod Events and validate image names, credentials, pull policy, and registry limits.
Harden your pipeline by pinning digests, authenticating pulls, and mirroring public images to avoid rate limits.
Use CubeAPM to monitor Events, metrics, and logs in one place so you can alert faster, pinpoint the cause, and restore service quickly.