ErrImagePull tells you Kubernetes could not download a container image for a Pod. The container stays in Waiting and the rollout stalls. It is noisy, disrupts deploys, and usually traces back to credentials, names, tags, policies, or the registry path. With outages costing an average of $14,056 per minute, even small errors like this add up fast. Once you read the Pod events and validate the image, pull policy, and secret setup, the fix is quick.
CubeAPM helps you catch these failures as they happen. By ingesting Kubernetes Events, Prometheus metrics, and container runtime logs, it surfaces ErrImagePull signals across clusters in real time. Teams can correlate failed pulls with deployments, registry errors, and rollout history without guesswork.
In this article, we explain what ErrImagePull means, why it happens, how to fix it, and how CubeAPM helps you monitor and prevent repeat errors at scale.
Table of Contents
ToggleWhat is ErrImagePull in Kubernetes
ErrImagePull appears when a Pod’s image pull attempt fails. The kubelet tries to fetch layers from the registry and receives an error, so the container never starts and the Pod remains Pending. After several failures Kubernetes backs off its retries, which often shows up as ImagePullBackOff.
You will see ErrImagePull or ImagePullBackOff in kubectl get pods
, and the exact reason in kubectl describe pod <name>
under Events. Typical messages include “manifest not found,” “authentication required,” or “Too Many Requests.” ErrImagePull is not the root cause by itself, it is the signal that the image download failed and the node is not able to pull what you specified in image:
.
Why ErrImagePull in Kubernetes Happens
Kubernetes can’t pull an image for several reasons. Some are simple mistakes, others come from registry limits, authentication problems, or infrastructure issues. Here are the main causes in detail:
1. Wrong image name or tag
Example: nginx:latestt
instead of nginx:latest
.
Registries reject unknown tags, and Kubernetes marks the Pod with ErrImagePull. This quickly escalates into ImagePullBackOff when retries keep failing.
Quick check:
kubectl describe pod <pod-name><br><br>
2. Missing or invalid credentials
Private registries (ECR, GCR, Harbor, etc.) require valid authentication. If imagePullSecrets are missing, placed in the wrong namespace, or expired, the registry denies the pull.
Events usually show “unauthorized” or “authentication required.”
Quick check:
kubectl get sa <serviceaccount> -o yaml | grep imagePullSecrets
3. Registry rate limits
Public registries like Docker Hub throttle anonymous pulls. Large clusters or CI pipelines often hit HTTP 429 “Too Many Requests.”
Quick check:
kubectl describe pod <pod-name> | grep "Too Many Requests"
4. Network or DNS issues
If nodes can’t resolve or connect to the registry, pulls fail. Misconfigured CoreDNS, blocked egress, or strict proxies are common causes. Events may show “no such host” or “connection refused.”
Quick check:
kubectl run -it --rm --image=busybox:1.36 netcheck -- nslookup index.docker.io
5. Policy or admission controls
Security policies may block unsigned images or enforce digest usage. Admission webhooks can reject images from repositories not approved for use. Events usually say “denied by webhook” or “image not allowed.”
Quick check:
kubectl describe pod <pod-name> | grep "denied"
6. Architecture mismatch
Sometimes the image is built only for amd64, but nodes are running arm64. Kubernetes can’t match the manifest to the node’s architecture. The error shows “no matching manifest for platform.”
Quick check:
docker manifest inspect <image>:<tag> | grep architecture
How to Fix ErrImagePull in Kubernetes
1. Check the image name and tag
Confirm the tag exists and is spelled correctly. Prefer immutable version tags over latest.
containers:
- name: web
image: nginx:1.27.0
2. Use a fully qualified path for non-Docker Hub
Include registry hostname and org.
image: registry.example.com/team/app:2.3.1
3. Verify access to a private registry
Create a docker-registry Secret and reference it in the Pod or ServiceAccount.
kubectl create secret docker-registry regcred --docker-server=registry.example.com --docker-username=$USER --docker-password=$PASSWORD --docker-email=devops@example.com -n prod
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
namespace: prod
spec:
template:
spec:
imagePullSecrets:
- name: regcred
containers:
- name: api
image: registry.example.com/prod/api:2.3.1
4. Inspect Pod Events for the exact failure
kubectl describe pod <pod> -n <ns> | sed -n '/Events/,$p'
kubectl get events -n <ns> --sort-by=.lastTimestamp | tail -30
5. Fix ImagePullPolicy to match how you publish
imagePullPolicy: IfNotPresent
# or pin by digest for repeatability
# image: myapp@sha256:3e1f46b54bb...
6. Confirm networking and DNS
Run checks from a temporary Pod to remove local workstation bias.
kubectl run netcheck -it --rm --image=busybox:1.36 -- sh
Inside the pod:
nslookup registry.example.com
wget -S --spider https://registry.example.com/v2/
7. Avoid rate limits
Authenticate pulls, mirror base images into a private registry, and stagger rollouts so nodes do not burst.
8. Reconcile policy and admission hooks
If a webhook denies the Pod, update allow-lists, required signatures, or switch to digests to satisfy policy.
9. Handle platform mismatches
Inspect the manifest and build multi-arch images if needed.
docker manifest inspect myimage:1.0 | grep architecture
10. Retry with a known-good tag
kubectl delete pod <pod> -n <ns>
kubectl rollout status deploy/<name> -n <ns>
Monitoring ErrImagePull in Kubernetes with CubeAPM
CubeAPM ingests Kubernetes Events like ErrImagePull alongside pod logs and kube-state metrics, so you see the failure string next to container runtime errors and deployment changes. This removes guesswork when triaging broken rollouts.
Dashboards let you slice by namespace, workload, node, image, and registry host. You can spot spikes in ErrImagePull, drill to the specific Pod, and confirm whether the cause is a bad tag, a missing secret, or a DNS or egress problem. Traces and deploy metadata give you the “what changed just before it broke” context.
Here is a breakdown of how CubeAPM Achieves this
1) Captures the right signals the moment they happen
CubeAPM ingests Kubernetes Events (including ErrImagePull and ImagePullBackOff), pod/container logs, and cluster/node metrics through an OpenTelemetry-native pipeline. That gives you the exact failure message from the kubelet alongside the surrounding log and metric context—no guessing.
2) Auto-enriches everything with Kubernetes metadata
Every event/log/metric is tagged with cluster, namespace, workload (Deployment/StatefulS et/DaemonSet), pod, container, image name/tag, node, labels, and annotations. This enrichment makes it trivial to pivot by “image=foo:1.27” or “namespace=payments” and see all related failures.
3) Correlates symptoms into a single timelin
ErrImagePull rarely lives alone. CubeAPM stitches events with signals like DNS error rates, node egress health, and rollout activity so you can tell if the root cause is a typo, missing secret, throttled registry, policy block, or network/DNS trouble.
4) Purpose-built views for fast triag
Dashboards surface: counts of ErrImagePull/ImagePullBackOff by namespace/workload, trending spikes over time, top failing images, and “new since last deploy” views. You can click from the spike to the exact pod and read the last failure line instantly.
5) Alerts that carry real context (not just noise)
Rules trigger on the event reason (ErrImagePull), the backoff state, and surge patterns within a namespace. Alerts include namespace, pod, container, image, and the last error string so on-call knows what to check first. Route to Slack, Email, PagerDuty, Opsgenie, Google Chat, Jira, or any system via Webhook. Deduplication and inhibition keep pages calm during bigger incidents.
6) A clean investigation workflow
From an alert: open the event → jump to pod logs → check the image name/tag and ServiceAccount → confirm secrets are present → review cluster DNS/egress signals → see what deployment or commit introduced the change. It’s a two-minute loop instead of bouncing between tools.
Example Alert Rules
1. PodErrImagePull — catch the first real failure
Use this as your tripwire. It fires when any container is stuck waiting with ErrImagePull long enough to rule out tiny flakes. First actions: read Pod Events, confirm the image path and tag, and verify the registry secret.
- alert: PodErrImagePull
expr: kube_pod_container_status_waiting_reason{reason="ErrImagePull"} > 0
for: 2m
labels:
severity: critical
annotations:
summary: "ErrImagePull in {{ $labels.namespace }}/{{ $labels.pod }}"
description: "Failed to pull image for {{ $labels.container }}. Validate image name, tag, and registry credentials."
2. PodImagePullBackOff — tell persistent from transient
This signals kubelet has moved to spaced retries, so the problem isn’t a blip. Keep it at warning to avoid extra paging while you fix tags, attach the right imagePullSecrets, or switch to a registry mirror.
- alert: PodImagePullBackOff
expr: kube_pod_container_status_waiting_reason{reason="ImagePullBackOff"} > 0
for: 5m
labels:
severity: warning
annotations:
summary: "ImagePullBackOff in {{ $labels.namespace }}/{{ $labels.pod }}"
description: "Kubernetes is backing off image pulls. Likely tag missing, auth failure, or throttling."
3. ManyErrImagePullInNamespace — stop bad rollouts fast
When several pods fail together in one namespace, assume a bad deploy, expired credentials, or a registry incident. Page quickly so you can pause or roll back before the blast radius grows.
- alert: ManyErrImagePullInNamespace
expr: sum by (namespace) (kube_pod_container_status_waiting_reason{reason="ErrImagePull"}) >= 5
for: 3m
labels:
severity: critical
annotations:
summary: "Multiple ErrImagePull in {{ $labels.namespace }}"
description: "Five or more containers cannot pull images. Check registry status, credentials, and the latest deployment."
4. CoreDNSHighServfailRate — early warning before pulls fail
DNS trouble often shows up minutes before pods hit ErrImagePull. Watch SERVFAIL ratios and fix CoreDNS, upstream DNS, or egress so you avoid a cascade of image pull errors.
- alert: CoreDNSHighServfailRate
expr: sum(rate(coredns_dns_response_rcode_count_total{rcode="SERVFAIL"}[5m])) / sum(rate(coredns_dns_requests_total[5m])) > 0.05
for: 10m
labels:
severity: warning
annotations:
summary: "High DNS SERVFAIL rate in cluster"
description: "DNS errors can cause image pull failures. Investigate CoreDNS, upstream resolvers, and egress."
Conclusion
ErrImagePull is common, but it is rarely mysterious. Most incidents come from bad tags, missing secrets, path changes, policy blocks, or simple network trouble. The fastest fix is to read the Pod Events and validate image, policy, and credentials.
CubeAPM shortens the path to root cause by putting Events, logs, metrics, and deploy context in one view. You see exactly what failed and what changed just before it.
Adopt the alerts above, ship events and kube-state metrics to CubeAPM, and make image pull failures fast to detect and boring to resolve.