The Kubernetes Memory Limit Exceeded error occurs when a container uses more memory than its Pod limit. The kubelet responds with an Out-Of-Memory (OOMKill), terminating and restarting the container. If ignored, this can stall Deployments, trigger CrashLoopBackOff cycles, and disrupt service availability. For production workloads, the impact is immediate—customer-facing apps can fail, jobs may stop mid-run, and dependent microservices quickly feel the strain.
CubeAPM makes OOMKill errors easier to diagnose by correlating Pod Events, container memory metrics, OOM logs, and rollout history in one place. Instead of chasing scattered alerts, engineers can instantly see if the issue comes from a bad memory request/limit, a leaking process, or a sudden workload surge..
In this guide, we’ll cover what the Memory Limit Exceeded error is, why it happens, how to fix it step-by-step, and how to monitor it effectively with CubeAPM.
Table of Contents
ToggleWhat is Kubernetes Memory Limit Exceeded Error
A Kubernetes Memory Limit Exceeded error in Kubernetes occurs when a container uses more memory than the hard cap defined in its Pod spec. The resources.limits.memory field sets this upper bound. Once breached, the kubelet issues an Out-Of-Memory (OOMKill), terminating the container immediately.
Key characteristics of this error include:
- Non-compressible resource: Unlike CPU, memory cannot be throttled; once the limit is exceeded, the process must be killed.
- Restart loops: Containers often re-enter a CrashLoopBackOff cycle after being OOMKilled.
- Service disruption: Applications may stall, background jobs fail, and dependent services can be affected.
- Event visibility: Pods usually show reason: OOMKilled in their termination state, making it identifiable through events and logs.
Why Kubernetes Memory Limit Exceeded Error Happens
1. Misconfigured Resource Limits
A Kubernetes Memory Limit Exceeded Error often occurs when memory requests and limits don’t reflect real workload behavior. This often happens when limits are copied from templates without profiling the app’s actual usage. Once exceeded, the container is killed, and the Pod may restart repeatedly.
2. Memory Leaks in Applications
Applications with inefficient garbage collection or poor memory management gradually consume more RAM over time. In Java, lingering objects in the heap can lead to memory bloat, while in Node.js, unhandled buffers may stay allocated. Eventually, the Pod hits its memory ceiling and gets terminated, often after hours or days of running.
3. Sudden Traffic Spikes
Traffic bursts such as flash sales, promotions, or API storms can cause workloads to allocate extra memory for request handling, caching, or parallel operations. If Pod limits are not tuned for such peaks, the container may crash mid-request, disrupting live transactions and affecting end-user experience.
4. Large In-Memory Data Processing
Workloads that rely heavily on in-memory storage—like analytics pipelines, log aggregators, or caching services—can blow past their memory budgets when processing large datasets. Without sufficient buffer, these Pods repeatedly hit the limit and are OOMKilled, causing instability in critical pipelines.
5. Node-Level Memory Pressure
Even if an individual Pod has reasonable limits, when the node itself runs out of available memory, the kubelet prioritizes eviction or termination of Pods. Low-priority workloads are often the first victims, which makes capacity planning and node autoscaling essential to avoid unplanned failures.
6. Inefficient Container Images
Containers sometimes run additional background processes—init scripts, monitoring daemons, or sidecars—that aren’t accounted for in memory planning. These hidden consumers add to the workload’s footprint and can unexpectedly push usage beyond the Pod’s set memory limit, triggering OOMKills.
Quick check:
kubectl top pod <pod-name>
This shows real-time memory usage compared against the Pod’s defined limits. If usage consistently hovers near the maximum, the Pod is at high risk of OOMKilled events.
How to Fix Kubernetes Memory Limit Exceeded Error
1. Adjust Resource Requests and Limits
When Pod memory limits are set unrealistically low, even normal workloads can exceed the cap and get killed. This is one of the most common causes of OOMKills in Kubernetes.
Quick check:
kubectl top pod <pod-name> -n <namespace>
Compare memory usage with the Pod’s configured limits. If usage is always higher, the cap is too strict.
Fix:
Update the Pod or Deployment spec with realistic values and allow buffer for peaks:
kubectl set resources deploy <deployment> -n <namespace> --containers=<container> --requests=memory=512Mi --limits=memory=1Gi
Then verify the rollout:
kubectl rollout status deploy/<deployment> -n <namespace>
2. Fix Memory Leaks in Applications
If memory consumption grows steadily until a Pod crashes, the application likely has a memory leak or unbounded allocation. Over time, this forces the kubelet to OOMKill the container.
Quick check:
kubectl describe pod <pod-name> -n <namespace> | grep -i OOMKilled
Repeated OOMKilled events after consistent uptime without traffic spikes point to a leak.
Fix:
Patch the application to release memory correctly (cap caches, free buffers, tune GC). Then redeploy:
kubectl rollout restart deploy/<deployment> -n <namespace>
3. Handle Sudden Traffic Spikes
Spikes in requests can increase memory allocations for caching, queues, and active sessions. Without enough headroom, Pods crash mid-transaction and restart.
Quick check:
kubectl top pod <pod-name> -n <namespace> --containers
If memory jumps sharply during peak traffic, limits are undersized.
Fix:
Raise memory limits or scale horizontally with HPA:
kubectl autoscale deploy <deployment> -n <namespace> --min=2 --max=10 --cpu-percent=70
4. Optimize In-Memory Data Processing
Services that buffer large datasets, logs, or cache layers often breach memory caps. Analytics or data pipelines are frequent offenders.
Quick check:
Inspect container logs for errors before OOMKill:
kubectl logs <pod-name> -n <namespace> --previous
Look for messages showing “killed process” or incomplete data operations.
Fix:
Tune workloads to batch data in smaller chunks or raise Pod memory limits. Consider offloading to external stores like Redis instead of keeping everything in memory.
5. Mitigate Node-Level Memory Pressure
Sometimes Pods crash not because of their own limit, but because the node itself is low on memory. The kubelet evicts Pods under pressure, often starting with the lowest-priority workloads.
Quick check:
kubectl describe node <node-name> | grep -i MemoryPressure
If true, the node is running out of capacity.
Fix:
Add more nodes or enable cluster autoscaler. Ensure critical Pods have higher QoS (Guaranteed or Burstable) to avoid eviction.
6. Eliminate Inefficient Container Images
Some images ship with extra daemons or scripts that consume memory unexpectedly. These hidden processes push the Pod beyond its defined limits.
Quick check:
kubectl exec -it <pod-name> -- ps aux
Check for unnecessary processes running inside the container.
Fix:
Rebuild container images to remove unused services. Keep images lean and validate memory footprint before deploying.
Monitoring Kubernetes Memory Limit Exceeded Error with CubeAPM
Fastest path to root cause: correlate Pod Events (OOMKilled), container memory Metrics (usage vs limits), Logs (pre-kill errors), and Rollouts (when a new version began crashing). CubeAPM brings these four streams together so you can see what died, why it died, and what changed right before it died.
Step 1 — Install CubeAPM (Helm)
Install or upgrade the CubeAPM agent with Helm (single line, copy-paste safe). If you maintain custom settings, pass a values.yaml.
helm repo add cubeapm https://charts.cubeapm.com && helm repo update && helm upgrade --install cubeapm-agent cubeapm/agent --namespace cubeapm --create-namespace
Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)
Use DaemonSet for node/pod scraping (kubelet, cAdvisor, file logs) and Deployment for central pipelines (processors, routing, exports).
helm upgrade --install otel-ds open-telemetry/opentelemetry-collector --namespace observability --set mode=daemonset
helm upgrade --install otel-deploy open-telemetry/opentelemetry-collector --namespace observability --set mode=deployment --set replicaCount=2
Step 3 — Collector Configs Focused on Memory-Exceeded / OOMKills
DaemonSet (node/pod collection) — minimal, OOM-focused
receivers:
k8s_events: {}
kubeletstats:
collection_interval: 20s
kubelet_address: ${KUBELET_ADDRESS}
auth_type: serviceAccount
metrics:
k8s.container.memory_limit: {}
k8s.container.memory_working_set: {}
filelog:
include: [ /var/log/containers/*.log ]
start_at: end
operators:
- type: filter
expr: 'contains(body, "OOMKilled") or contains(body, "Memory cgroup out of memory")'
processors:
batch: {}
memory_limiter:
check_interval: 2s
limit_percentage: 90
spike_limit_percentage: 25
k8sattributes:
extract:
metadata: [ containerName, podName, namespace, nodeName, podUID ]
exporters:
otlp:
endpoint: cubeapm-agent.cubeapm.svc.cluster.local:4317
service:
pipelines:
logs:
receivers: [filelog, k8s_events]
processors: [k8sattributes, batch]
exporters: [otlp]
metrics:
receivers: [kubeletstats]
processors: [k8sattributes, batch]
exporters: [otlp]
- k8s_events: streams Kubernetes Events (e.g., reason=OOMKilled) for timeline correlation.
- kubeletstats: collects container memory usage/limits from kubelet for precise “usage vs limit” graphs.
- filelog: tails container logs and highlights OOM signatures to pinpoint pre-kill errors.
- memory_limiter: protects the Collector itself under pressure so it won’t flap during node stress.
- k8sattributes: attaches pod/namespace/node labels for drill-down and dashboards.
Deployment (central routing & transforms) — minimal, OOM-focused
receivers:
otlp:
protocols: { grpc: {}, http: {} }
processors:
batch: {}
transform:
error_mode: ignore
metric_statements:
- context: datapoint
statements:
- set(attributes["oom_risk"], "high") where metric.name == "k8s.container.memory_working_set" and value_double > 0.9 * attributes["k8s.container.memory_limit"]
resource:
attributes:
- key: service.namespace
action: upsert
from_attribute: k8s.namespace.name
exporters:
otlp:
endpoint: cubeapm-agent.cubeapm.svc.cluster.local:4317
service:
pipelines:
metrics:
receivers: [otlp]
processors: [transform, resource, batch]
exporters: [otlp]
logs:
receivers: [otlp]
processors: [resource, batch]
exporters: [otlp]
- transform: tags samples where working-set > 90% of limit as oom_risk=high for easy alerting and dashboards.
- resource: normalizes k8s namespace into service.namespace for consistent filtering.
- otlp→otlp: central pipeline enhances/labels then forwards to CubeAPM.
Step 4 — Supporting Components (optional but recommended)
helm upgrade --install kube-state-metrics prometheus-community/kube-state-metrics --namespace kube-system
Step 5 — Verification (What You Should See in CubeAPM)
- Events stream: You should see reason: OOMKilled events aligned with the exact Pod/Container and timestamp.
- Metrics panels: You should see container memory working set vs memory limit with spikes flagged as oom_risk=high.
- Logs timeline: You should see log lines immediately preceding the kill (e.g., GC pauses, “killed process” messages).
- Restarts counter: You should see increasing restart counts on the affected container correlated with OOM events.
- Rollout context: You should see whether a new image or config rolled out shortly before OOMs began.
Example Alert Rules for Kubernetes Memory Limit Exceeded Error
1. OOMKill Event Detected (Immediate Signal)
Fire as soon as Kubernetes reports a container was killed due to memory limits. This is your ground truth that the process was terminated by the kubelet.
groups:
- name: kubernetes-oomkill.rules
rules:
- alert: KubernetesOOMKillEvent
expr: sum by (namespace,pod,container) (increase(kube_pod_container_status_terminated_reason{reason="OOMKilled"}[5m])) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "OOMKill in {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }})"
description: "Container exceeded memory limit and was killed (OOMKill). Investigate resource limits, leaks, or traffic spikes."
2. Working Set Near/Over Limit (Early Warning)
Warn before the kill by comparing current working set to the container’s configured memory limit.
groups:
- name: kubernetes-oom-risk.rules
rules:
- alert: KubernetesMemoryWorkingSetNearLimit
expr: (
container_memory_working_set_bytes{container!=""}
/
on (namespace,pod,container) group_left
kube_pod_container_resource_limits{resource="memory"}
) > 0.90
for: 5m
labels:
severity: warning
annotations:
summary: "High OOM risk in {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }})"
description: "Working set >90% of memory limit for 5m. Increase limits, optimize allocations, or scale out."
Note: If your kube_pod_container_resource_limits is in bytes, ensure both sides of the ratio are bytes (many distros already expose bytes).
3. Restart Storm After OOM (Impact Monitor)
Catch containers that keep bouncing due to repeated OOMKills.
groups:
- name: kubernetes-restarts.rules
rules:
- alert: KubernetesRestartSpikeAfterOOM
expr: rate(kube_pod_container_status_restarts_total[10m]) > 0.1
for: 10m
labels:
severity: high
annotations:
summary: "Restart spike in {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }})"
description: "Container restarts accelerating (likely after OOM). Check events/logs and raise limits or patch leaks."
4. Node Under MemoryPressure (Root-Cause Context)
If the node itself is constrained, Pods can be evicted or killed even with reasonable limits.
groups:
- name: kubernetes-node.rules
rules:
- alert: KubernetesNodeMemoryPressure
expr: kube_node_status_condition{condition="MemoryPressure",status="true"} == 1
for: 2m
labels:
severity: warning
annotations:
summary: "Node under MemoryPressure: {{ $labels.node }}"
description: "Node reports MemoryPressure. Scale nodes, rebalance workloads, or reduce per-Pod memory."
5. Post-Rollout OOM Regression (Change-Aware)
Detects OOMs shortly after a rollout so you can quickly revert or fix the new version/config.
groups:
- name: kubernetes-rollout-oom.rules
rules:
- alert: KubernetesOOMSpikeAfterRollout
expr: (
sum by (namespace,deployment) (increase(kube_pod_container_status_terminated_reason{reason="OOMKilled"}[15m]))
) > 0
and on(namespace,deployment)
(
sum by (namespace,deployment) (increase(kube_deployment_status_observed_generation[15m])) > 0
or
sum by (namespace,deployment) (increase(kube_deployment_status_replicas_updated[15m])) > 0
)
for: 0m
labels:
severity: critical
annotations:
summary: "OOM spike after rollout in {{ $labels.namespace }}/{{ $labels.deployment }}"
description: "OOMKills detected within 15m of a rollout. Suspect mis-sized limits, memory leak, or config change."
6. Sustained Memory Saturation (SLO-Oriented)
Longer window saturation to catch chronic under-provisioning without noisy flaps.
groups:
- name: kubernetes-saturation.rules
rules:
- alert: KubernetesSustainedMemorySaturation
expr: avg_over_time(
container_memory_working_set_bytes{container!=""}
/
on (namespace,pod,container) group_left
kube_pod_container_resource_limits{resource="memory"}
[30m]
) > 0.80
for: 30m
labels:
severity: info
annotations:
summary: "Sustained memory saturation in {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }})"
description: "Avg working set >80% of limit over 30m. Consider right-sizing requests/limits or scaling out."
Conclusion
The Kubernetes Memory Limit Exceeded error often surfaces when clusters scale under unpredictable workloads. Left unmanaged, it leads to unstable services, failed jobs, and recurring Pod restarts.
Preventing these failures requires more than just fixing limits—it demands visibility into how memory behaves under real traffic. CubeAPM helps by surfacing trends over time, highlighting Pods that are consistently near their limits, and tying memory issues back to rollouts or workload changes. Instead of reacting to OOMKills, teams can anticipate them and tune their clusters before failures happen.