CubeAPM
CubeAPM CubeAPM

Kubernetes Disk Pressure Error Explained: Node Evictions, Root Causes, and Monitoring with CubeAPM

Author: | Published: October 11, 2025 | Kubernetes Errors

Kubernetes disk pressure error occurs when a node runs critically low on disk space or inodes, prompting the kubelet to evict pods or halt new scheduling. In fact, 93% of organizations now run Kubernetes in production, making disk and resource management a critical reliability concern. When nodes hit DiskPressure, workloads are rescheduled, deployments fail, and performance across the cluster can quickly degrade.

CubeAPM helps teams detect and prevent Kubernetes disk pressure error before it impacts workloads by continuously tracking filesystem metrics. It correlates these with eviction events, highlights pods or images consuming excessive space, and triggers anomaly alerts — ensuring storage health, predictable performance, and full visibility across Kubernetes clusters.

In this guide, we’ll explain what the Kubernetes DiskPressure error means, why it happens, how to fix it, and how to monitor and alert on it effectively using CubeAPM.

What is Kubernetes Disk Pressure Error

kubernetes disk pressure error

The condition in Kubernetes signals that a node is running out of available disk resources. It’s one of the core node conditions (MemoryPressure, PIDPressure, DiskPressure, NetworkUnavailable, Ready) that the kubelet reports to the control plane to describe node health. When disk usage crosses predefined eviction thresholds — typically set under evictionHard or evictionSoft in the kubelet configuration — the kubelet marks the node as DiskPressure=True.

This alert means the kubelet can no longer guarantee storage for running containers or system processes. Kubernetes reacts by evicting low-priority pods, pausing new scheduling on the node, and freeing disk space to recover stability. While this helps protect the node, it often disrupts workloads unexpectedly — especially when the underlying cause is image bloat, uncollected logs, or persistent volume growth that goes unnoticed.

In short, the DiskPressure error is Kubernetes’ self-protection mechanism: it prevents total node failure by evicting pods when disk space becomes dangerously low, but it can also create cascading issues if teams lack visibility into what’s consuming the storage.

Key Characteristics of kubernetes disk pressure error:

  • Triggered when nodefs or imagefs crosses eviction thresholds
  • Node status changes to DiskPressure=True under kubectl describe node
  • The kubelet evicts low-priority pods and pauses new scheduling
  • Often caused by log bloat, orphaned images, or temporary volume overflow
  • Can affect StatefulSets, CI/CD jobs, and nodes running high I/O workloads

Why Kubernetes Disk Pressure Error Happens

Kubernetes disk pressure error typically appears when the kubelet detects insufficient free space on either the node filesystem (nodefs) or the container image filesystem (imagefs). Below are the most common causes behind it.

1. Excessive Container Logs

Large application logs stored under /var/lib/docker/containers can quickly consume node storage, especially when log rotation isn’t configured. This is one of the leading triggers for DiskPressure alerts in long-running workloads.

Quick check:

Bash
 du -sh /var/lib/docker/containers/* | sort -h

If you see one or more log directories exceeding several gigabytes, enable log rotation or redirect logs to a centralized backend.

2. Unused or Orphaned Images

Old images left on the node after deployments, failed pulls, or rollbacks accumulate over time. The kubelet might not automatically clean them if disk thresholds aren’t reached yet, leading to gradual storage exhaustion.

Quick check:

Bash
crictl images | grep <repository>


If you see a large list of outdated or untagged images, run crictl image prune or docker system prune -a to reclaim space safely.

3. Temporary Volume Growth (emptyDir and Cache Directories)

Temporary directories used by emptyDir volumes or app-level caching (e.g., npm, Maven, or build artifacts) can silently expand until they consume all available disk space.

Quick check:

Bash
kubectl describe pod <pod-name> | grep emptyDir -A5



If you see large emptyDir volumes or temporary mounts without size limits, adjust manifests or move cache data to external storage.

4. Containerd or Docker Overlay Data Expansion

OverlayFS directories under /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs or /var/lib/docker/overlay2 may grow excessively due to incomplete cleanup of layers or build cache.

Quick check:

Bash
du -sh /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/* | sort -h


If you see subdirectories consuming tens of gigabytes, clear unused layers and restart the container runtime to trigger cleanup.

5. Persistent Volume Mismanagement

Large or unbounded PersistentVolumeClaims can cause unexpected disk consumption on the node hosting the volume, especially when dynamic provisioning is used without limits.

Quick check:

Bash
kubectl get pvc -A -o wide


If you see PVCs bound to the affected node with high capacity or unbounded requests, set storage quotas or use dedicated storage classes.

6. Image Cache Bloat and Registry Mirroring

Nodes that pull images frequently from private registries or mirror large layers without garbage collection often hit DiskPressure thresholds faster.

Quick check:

Bash
crictl image prune --dry-run


If you see a long list of cached images, enable automatic pruning or use smaller base images to reduce layer footprint.

7. Eviction Thresholds Set Too Low

In some clusters, kubelet eviction settings under /var/lib/kubelet/config.yaml are configured too aggressively (e.g., evictionHard: {“nodefs.available”: “10%”}). This triggers premature DiskPressure even with sufficient usable space.

Quick check:

Bash
 cat /var/lib/kubelet/config.yaml | grep eviction

If you see overly strict thresholds like 10% or less, adjust them to balanced values (e.g., 5% for nodefs.available and 3% for imagefs.available).

How to Fix Kubernetes Disk Pressure Error

Fixing DiskPressure requires cleaning up unused data, reconfiguring eviction thresholds, and optimizing container storage. Follow these steps to resolve it efficiently.

1. Clear Excessive Container Logs

Container logs in /var/lib/docker/containers often grow unchecked when rotation isn’t configured.

Quick check:

Bash
 du -sh /var/lib/docker/containers/* | sort -h


Fix: Enable log rotation and truncate oversized logs to free up disk space immediately.

Bash
find /var/lib/docker/containers/ -name "*.log" -type f -size +500M -exec truncate -s 0 {} \;

2. Remove Unused or Orphaned Images

Old or failed images can consume large portions of the node’s filesystem.

Quick check:

Bash
crictl images | grep <repository>


Fix: Prune unused images safely using the container runtime.

Bash
crictl image prune -f

 

3. Clean Temporary Volumes and EmptyDir Data

Temporary emptyDir volumes or caches can silently fill node storage.

Quick check:

Bash
 kubectl describe pod <pod-name> | grep emptyDir -A5


Fix: Apply a size limit to temporary volumes or delete stale pods with unbounded caches.

Bash
kubectl patch pod <pod-name> -p '{"spec":{"volumes":[{"name":"cache","emptyDir":{"sizeLimit":"500Mi"}}]}}'

4. Reclaim OverlayFS and Containerd Cache

Overlay and snapshot directories in containerd often retain unused layers.

Quick check:

Bash
du -sh /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/* | sort -h


Fix: Stop containerd, remove orphaned layers, and restart the service to reclaim space.

Bash
sudo systemctl stop containerd && sudo rm -rf /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/* && sudo systemctl start containerd

5. Adjust Eviction Thresholds

Aggressive eviction thresholds may trigger DiskPressure early.

Quick check:

Bash
cat /var/lib/kubelet/config.yaml | grep eviction


Fix:
Relax kubelet eviction settings for node and image filesystems.

Bash
sudo sed -i '/evictionHard/d' /var/lib/kubelet/config.yaml && echo 'evictionHard: {"nodefs.available": "5%", "imagefs.available": "3%"}' | sudo tee -a /var/lib/kubelet/config.yaml && sudo systemctl restart kubelet

 

6. Enable Automatic Image Garbage Collection

Without garbage collection, nodes accumulate unused image layers indefinitely.

Quick check:

Bash
ps aux | grep kubelet | grep gc-threshold


Fix: Configure automatic image garbage collection in kubelet flags.

Bash
sudo sed -i '/image-gc/d' /var/lib/kubelet/config.yaml && echo '--image-gc-high-threshold=85 --image-gc-low-threshold=80' | sudo tee -a /etc/default/kubelet && sudo systemctl restart kubelet

7. Move Container Storage to a Dedicated Disk

Nodes with small root partitions run out of space faster.

Quick check:

Bash
df -h | grep var/lib/docker


Fix: Mount a larger or separate disk for container storage.

Bash
sudo mount /dev/sdb1 /var/lib/docker

Monitoring Kubernetes Disk Pressure Error with CubeAPM

The fastest way to troubleshoot DiskPressure is by correlating node metrics, kubelet events, and filesystem logs. CubeAPM brings these together through its four unified signal streams — Metrics, Events, Logs, and Rollouts — to pinpoint which nodes or workloads are consuming excessive disk space. By continuously tracking filesystem usage (nodefs, imagefs) and kubelet eviction signals, CubeAPM helps teams detect early disk saturation before it triggers pod evictions.

Step 1 — Install CubeAPM (Helm)

Install CubeAPM in your cluster using Helm. This deploys dashboards, pipelines, and alert templates for Kubernetes node and storage metrics.

Bash
helm install cubeapm https://charts.cubeapm.com/cubeapm-latest.tgz --namespace cubeapm --create-namespace

Upgrade later with:

Bash
helm upgrade cubeapm https://charts.cubeapm.com/cubeapm-latest.tgz -n cubeapm

Configure custom tokens or endpoints in values.yaml if using BYOC or on-premise mode.

Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)

Deploy the collector in two modes for full coverage:

DaemonSet: Gathers per-node filesystem metrics and kubelet events.

Bash
helm install cube-otel-ds https://charts.cubeapm.com/otel-collector-ds.tgz -n cubeapm

Deployment: Acts as the central telemetry pipeline to CubeAPM.

Bash
helm install cube-otel-deploy https://charts.cubeapm.com/otel-collector-deploy.tgz -n cubeapm

 Step 3 — Collector Configs Focused on DiskPressure

DaemonSet Config (Node Metrics + Disk Stats):

YAML
receivers:

  prometheus:

    config:

      scrape_configs:

        - job_name: 'kubelet-node'

          static_configs:

            - targets: ['localhost:10255']

processors:

  batch:

exporters:

  otlp:

    endpoint: cubeapm:4317

service:

  pipelines:

    metrics:

      receivers: [prometheus]

      processors: [batch]

      exporters: [otlp]

 

  • Prometheus receiver: Scrapes kubelet metrics for nodefs and imagefs usage.
  • Batch processor: Optimizes transmission of large metric sets.
  • OTLP exporter: Sends data directly to CubeAPM’s ingestion endpoint.

Deployment Config (Events + Logs):

YAML
receivers:

  kubeletstats:

    collection_interval: 60s

  filelog:

    include: [/var/log/kubelet.log, /var/log/syslog]

processors:

  memory_limiter:

    limit_mib: 500

  batch:

exporters:

  otlp:

    endpoint: cubeapm:4317

service:

  pipelines:

    logs:

      receivers: [filelog]

      processors: [memory_limiter, batch]

      exporters: [otlp]

    metrics:

      receivers: [kubeletstats]

      processors: [batch]

      exporters: [otlp]

 

  • kubeletstats receiver: Captures node conditions like DiskPressure=True.
  • filelog receiver: Streams kubelet logs containing eviction and disk usage events.
  • memory_limiter: Prevents overload during event spikes.

Step 4 — Supporting Components (Optional)

Deploy kube-state-metrics for richer visibility into pod and PVC states.

Bash
helm install kube-state-metrics https://charts.cubeapm.com/kube-state-metrics.tgz -n cubeapm

Step 5 — Verification Checklist

Before going live, validate that CubeAPM is ingesting all signals correctly:

  • Events: Eviction warnings such as “Pod evicted due to DiskPressure” appear in the Events view.
  • Metrics: node_filesystem_avail_bytes and imagefs_available_bytes show node-level trends.
  • Logs: Kubelet log entries confirm eviction or cleanup events.
  • Restarts: Pods redeploy automatically when disk pressure resolves.
  • Rollouts: Deployment view highlights which workload triggered the condition.

Example Alert Rules for Kubernetes DiskPressure Error

You can use these alert rules to proactively detect nodes nearing disk exhaustion and automatically trigger alerts before the kubelet evicts pods. These PromQL rules integrate directly into CubeAPM’s alert manager or Prometheus-compatible pipelines.

1. Node Disk Usage Above Threshold

This alert triggers when a node’s filesystem usage exceeds 85%, indicating imminent DiskPressure risk.

YAML
- alert: NodeDiskUsageHigh

  expr: (1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100 > 85

  for: 5m

  labels:

    severity: warning

  annotations:

    summary: "High Disk Usage on Node"

    description: "Node {{ $labels.instance }} is using more than 85% of its disk space."

 

2. Node Condition: DiskPressure True

This alert fires when the kubelet explicitly reports a node in DiskPressure=True state.

YAML
- alert: NodeDiskPressureDetected

  expr: kube_node_status_condition{condition="DiskPressure",status="true"} == 1

  for: 2m

  labels:

    severity: critical

  annotations:

    summary: "Kubernetes Node Under Disk Pressure"

    description: "Node {{ $labels.node }} is reporting DiskPressure=True. Evictions or scheduling failures may occur."

3. Low ImageFS Space (Container Cache Saturation)

This alert identifies when container image storage (imagefs) is nearly full, which often precedes DiskPressure.

YAML
- alert: NodeImageFSLow

  expr: (1 - (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"})) * 100 > 90

  for: 10m

  labels:

    severity: warning

  annotations:

    summary: "Low ImageFS Space Detected"

    description: "Image filesystem on node {{ $labels.instance }} is above 90% capacity."

These rules help CubeAPM’s alert engine trigger early warnings through Slack, Teams, or WhatsApp integrations — giving operators time to prune images, rotate logs, or reschedule workloads before node DiskPressure disrupts deployments.

Conclusion

DiskPressure is one of the most disruptive node-level conditions in Kubernetes, often leading to cascading pod evictions and unpredictable outages. Without proactive monitoring, it can silently degrade performance and stall deployments across the cluster.

By correlating node metrics, kubelet events, and container logs, CubeAPM helps teams detect DiskPressure before it causes service impact. Its OpenTelemetry-native pipelines continuously track storage saturation, log growth, and eviction trends across all nodes and workloads.

Start monitoring your Kubernetes clusters with CubeAPM today — gain real-time visibility into disk health, prevent evictions, and maintain peak reliability at predictable cost.

×