CubeAPM has been featured in Inc42’s list of “30 Startups to Watch Out For". Read Now ×

Book a demo

Kubernetes CPU Throttling Error: Node Scheduling, Pod QoS Classes, and Telemetry with CubeAPM

October 8, 2025 | Published

October 8, 2025 | Updated

172 Min | Reading

Vijay Aggarwal | Author

The 2024 State of Production Kubernetes report by Spectro Cloud found that over 71% of organizations hit performance bottlenecks tied to CPU limits and throttling. This issue occurs when Kubernetes enforces strict CPU quotas, causing workloads to stall even when nodes have idle capacity. For latency-sensitive services—APIs, trading systems, or SaaS apps—CPU throttling means higher tail latency, missed SLAs, and hidden costs from over-provisioning.

CubeAPM makes CPU throttling visible where other tools don’t. By correlating throttle metrics with rollout history, pod events, and container logs, it pinpoints which workloads are being throttled, why, and when. Teams can trace slow requests directly back to throttled pods—turning a silent performance killer into an actionable fix.

In this guide, we’ll cover what CPU throttling is, why it happens, how to fix it, and how to monitor it in real time with CubeAPM.

Table of Contents

What is Kubernetes CPU Throttling Error

kubernetes cpu throttling error

Kubernetes CPU throttling happens when a container is allocated fewer CPU cycles than it requests, because of the CPU limits defined in the Pod spec. Once a container consumes its quota, the Linux kernel’s Completely Fair Scheduler (CFS) pauses execution until the next scheduling interval. This behavior ensures fair scheduling across workloads but can silently choke performance if limits are misconfigured.

Throttling also interacts poorly with bursty workloads. For example, a service might only spike CPU usage for a few milliseconds during request parsing or encryption. But if throttling kicks in during that spike, even short delays can snowball into tail latency issues across distributed systems. In multi-tenant clusters, one team’s aggressive CPU limits can cascade into degraded performance for shared workloads.

The impact is often seen in:

Higher response latency: Requests take longer to complete as containers wait for CPU cycles.
Reduced throughput: Services handle fewer requests per second under sustained throttling.
Unstable performance: Latency spikes occur unpredictably, especially during traffic surges.

Why Kubernetes CPU Throttling Happens

1. Overly Aggressive CPU Limits

When developers set CPU limits too low relative to workload demand, Kubernetes strictly enforces them. The kernel scheduler cuts off execution cycles once the quota is consumed, causing the pod to stall even though node resources may still be available.

Quick check:

Bash

kubectl describe pod <pod-name> | grep -A5 "Limits"

If the limit value is smaller than observed usage, throttling is likely.

2. Bursty or Latency-Sensitive Workloads

Applications with short CPU spikes—such as encryption, JSON parsing, or request batching—often hit throttling during bursts. Even if average CPU usage is low, sudden peaks are capped, leading to delayed responses and tail-latency outliers.

Quick check:

Bash

kubectl top pod <pod-name>

Look for usage patterns where spikes exceed limits for short intervals.

3. Noisy Neighbor Effects in Multi-Tenant Clusters

In shared environments, one service consuming excess CPU can indirectly cause throttling for others, especially if limits are configured conservatively. This cross-workload contention leads to unpredictable slowdowns that don’t show up in traditional pod health checks.

Quick check:

Bash

kubectl top node

If node utilization is high while multiple pods report throttling, noisy neighbors are a probable cause.

4. Misconfigured Requests vs. Limits

A mismatch between requests and limits often triggers throttling. For example, if requests are low but limits are just slightly above, the scheduler may allocate insufficient CPU shares, leading to frequent pauses under real load.

Quick check:

Bash

kubectl describe pod <pod-name> | grep -A5 "Requests"

Compare requests with actual CPU usage to spot under-provisioned workloads.

5. CFS Quota Period Defaults

Kubernetes relies on the Linux CFS (Completely Fair Scheduler) to enforce CPU quotas. By default, quotas are applied over a 100ms period, meaning workloads can run freely until the quota is hit, after which they’re throttled until the next window. For CPU-bound apps, this introduces jitter and unpredictable stalls.

Quick check:
Inspect cgroup settings in /sys/fs/cgroup/cpu/ for cpu.cfs_quota_us and cpu.cfs_period_us values. If quotas are very restrictive, throttling will be frequent.

6. Cluster Autoscaler Interactions

When autoscaling is tuned only on CPU requests (not actual throttled usage), workloads may look “healthy” to the scheduler while still being throttled. This causes the cluster to under-scale, leaving pods throttled despite available nodes.

Quick check:

Review HorizontalPodAutoscaler (HPA) metrics:

Bash

kubectl get hpa

If scaling thresholds are based only on average CPU usage, throttling can persist unnoticed.

How to Fix Kubernetes CPU Throttling

Fixing throttling means validating resource configs and smoothing CPU bursts so workloads get the cycles they actually need—without blowing up cluster costs. Use the targeted checks and one-line fixes below.

1) Raise or Remove Over-Tight CPU Limits

If limits are too low, the kernel enforces quotas and pauses execution. Keep reasonable requests for scheduling, but avoid tiny limits that choke bursts.

Quick check:

Bash

kubectl describe pod <pod> | grep -A5 -E "Requests|Limits"

Fix (raise limits):

Bash

kubectl set resources deploy <deploy> --limits=cpu=1000m --requests=cpu=500m

Fix (remove limits, keep requests):

Bash

kubectl set resources deploy <deploy> --limits=cpu= --requests=cpu=500m

2) Match Requests to Real Load (Right-Size)

Low requests can starve scheduling shares and make throttling more likely under load spikes. Align requests with observed p95 CPU usage.

Quick check (top):

Bash

kubectl top pod <pod>

Fix (bump requests to observed steady state):

Bash

kubectl set resources deploy <deploy> --requests=cpu=700m

3) Use Guaranteed QoS for Critical Latency Paths

For ultra-sensitive services, setting requests == limits gives the pod stronger CPU guarantees and reduces throttle jitter.

Quick check:

Bash

kubectl describe pod <pod> | grep -A5 -E "Requests|Limits"

Fix (make Guaranteed):

Bash

kubectl set resources deploy <deploy> --requests=cpu=1000m --limits=cpu=1000m

4) Smooth Bursts with HPA/VPA (Avoid Spiky Hot Pods)

Bursty apps hit limits briefly and get throttled; scale them out/in to spread spikes across replicas.

Quick check (HPA present?):

Bash

kubectl get hpa

Fix (create HPA on CPU):

Bash

kubectl autoscale deploy <deploy> --cpu-percent=60 --min=3 --max=12

Fix (enable VPA in recommend/auto mode; apply your VPA manifest after installing VPA):

Bash

kubectl apply -f vpa-recommendation.yaml

5) Reduce Noisy-Neighbor Pressure (Bin-Packing Guards)

If nodes run hot, even modest limits throttle more often. Spread load or reserve CPU for critical pods.

Quick check (node pressure):

Bash

kubectl top node

Fix (anti-affinity rollout example—apply your spec):

Bash

kubectl rollout restart deploy <latency-critical-deploy>

Fix (priority & preemption—after adding a PriorityClass in the cluster):

Bash

kubectl patch deploy <deploy> -p '{"spec":{"template":{"spec":{"priorityClassName":"latency-critical"}}}}'

6) Tune Runtime Flags to Lower CPU Spikes

Inefficient GC or excessive worker threads cause micro-bursts that hit limits. Cap concurrency or adjust GC to flatten peaks.

Quick check (app logs around spikes):

Bash

kubectl logs <pod> -c <container> --since=10m

Fix (example env for Go to cap threads):

Bash

kubectl set env deploy <deploy> GOMAXPROCS=2 GODEBUG=gctrace=1

7) Consider Disabling CFS Quota Carefully (Cluster-Wide)

As a last resort for specific clusters, kubelet’s CPU CFS quota can be disabled, but this risks runaway CPU. Prefer app/limit tuning first.

Quick check (cluster policy—ask platform team):

Bash

kubectl -n kube-system get ds kubelet-config -oyaml | grep -i cfs

Fix (platform change—do not do casually):

Ask platform owners to set the kubelet flag –cpu-cfs-quota=false via the cluster’s node config mechanism and roll nodes safely.

8) Rebalance Hot Pods Across Nodes (Topology)

If a few nodes host most hot pods, they’ll throttle first. Spread replicas to even out CPU headroom.

Quick check (which nodes host throttled pods):

Bash

kubectl get pod -o wide | grep <deploy>

Fix (add preferred spread by hostname—apply updated spec):

Bash

kubectl rollout restart deploy <deploy>

Monitoring Kubernetes CPU Throttling with CubeAPM

Fastest path to root cause: CPU throttling is best diagnosed by correlating four streams in one place—Events, Metrics, Logs, and Rollouts. CubeAPM ties throttle metrics (container CFS quotas) to rollout history and pod events, so you can see which pods are throttled, when it started (post-deploy or traffic surge), and why (limits, noisy neighbors, burstiness). See the product docs for installs, config, and instrumentation: Install CubeAPM, Kubernetes setup, Configure, Instrumentation.

Step 1 — Install CubeAPM (Helm)

Use SaaS or BYOC. For BYOC, deploy via Helm with your values.yaml (endpoint, auth, storage).

Install (BYOC example; replace placeholders with values from the docs):

Bash

helm repo add cubeapm <CUBEAPM_HELM_REPO_URL> && helm repo update && helm install cubeapm <CUBEAPM_CHART_NAME> --namespace cubeapm-system --create-namespace --values values.yaml

Upgrade:

Bash

helm upgrade cubeapm <CUBEAPM_CHART_NAME> --namespace cubeapm-system --values values.yaml

Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)

DaemonSet → runs on every node, scrapes node/pod metrics (incl. cAdvisor) and tails logs.
Deployment → central pipeline that ingests events, enriches attributes, batches, and exports to CubeAPM.

DaemonSet (helm):

Bash

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts && helm repo update && helm install otel-ds open-telemetry/opentelemetry-collector --namespace observability --create-namespace --set mode=daemonset

Deployment (helm):

Bash

helm install otel-core open-telemetry/opentelemetry-collector --namespace observability --set mode=deployment

Step 3 — Collector Configs Focused on CPU Throttling

Below are minimal YAML snippets tailored to surface throttling. Apply through your Helm values or ConfigMap.

3.1 DaemonSet (node-level) — metrics & logs focused on throttling

YAML

receivers:

  prometheus:

    config:

      scrape_configs:

        - job_name: 'kubelet-cadvisor'

          scheme: https

          metrics_path: /metrics/cadvisor

          kubernetes_sd_configs:

            - role: node

          tls_config:

            insecure_skip_verify: true

          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

          relabel_configs:

            - action: labelmap

              regex: __meta_kubernetes_node_label_(.+)

  filelog:

    include:

      - /var/log/containers/*.log

    start_at: end

processors:

  k8sattributes:

    extract:

      metadata:

        - k8s.pod.name

        - k8s.container.name

        - k8s.namespace.name

        - k8s.node.name

  resource:

    attributes:

      - key: telemetry.sdk.language

        value: kubernetes

        action: upsert

  batch: {}

exporters:

  otlp:

    endpoint: ${CUBEAPM_OTLP_ENDPOINT}

    tls:

      insecure: false

    headers:

      authorization: ${CUBEAPM_OTLP_TOKEN}

service:

  pipelines:

    metrics:

      receivers: [prometheus]

      processors: [k8sattributes, resource, batch]

      exporters: [otlp]

    logs:

      receivers: [filelog]

      processors: [k8sattributes, resource, batch]

      exporters: [otlp]

prometheus receiver (cadvisor): scrapes container metrics including CFS throttle counters (e.g., throttled seconds/periods) from kubelet’s cAdvisor endpoint.
filelog receiver: tails container logs for error bursts that correlate with throttle spikes.
k8sattributes/resource processors: add k8s labels (pod, container, node) for pinpointing which workloads are throttled.
batch: efficient, back-pressure friendly export.
otlp exporter: ships metrics/logs securely to CubeAPM.

3.2 Deployment (cluster-level) — events & rollout context

YAML

receivers:

  otlp:

    protocols:

      grpc:

      http:

  k8s_events:

    auth_type: serviceAccount

  k8sobjects:

    objects:

      - name: deployments

        mode: watch

      - name: replicasets

        mode: watch

      - name: pods

        mode: watch

processors:

  k8sattributes:

  attributes:

    actions:

      - key: workload.type

        value: cpu-throttling

        action: upsert

  batch: {}

exporters:

  otlp:

    endpoint: ${CUBEAPM_OTLP_ENDPOINT}

    headers:

      authorization: ${CUBEAPM_OTLP_TOKEN}

service:

  pipelines:

    metrics:

      receivers: [otlp]

      processors: [k8sattributes, attributes, batch]

      exporters: [otlp]

    logs:

      receivers: [otlp]

      processors: [k8sattributes, attributes, batch]

      exporters: [otlp]

    traces:

      receivers: [otlp]

      processors: [k8sattributes, attributes, batch]

      exporters: [otlp]

    events:

      receivers: [k8s_events]

      processors: [k8sattributes, attributes, batch]

      exporters: [otlp]

k8s_events receiver: streams Kubernetes Events (e.g., scaling, scheduling) to line up “when throttling began” with cluster activity.
k8sobjects receiver: watches Deployments/ReplicaSets/Pods so you can overlay rollout history on throttling timelines.
otlp receiver: accepts telemetry from app/sidecar/daemonset pipelines.
attributes processor: tags this pipeline for easy querying of cpu-throttling dashboards and alerts.
otlp exporter: sends everything to CubeAPM for correlation across signals.

Configuration structure should follow your Helm chart’s values format; the above shows component intent. Map them under the chart’s config key per Configure.

Step 4 — Supporting Components (optional but recommended)

kube-state-metrics (for richer workload metadata):

Bash

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && helm repo update && helm install kube-state-metrics prometheus-community/kube-state-metrics --namespace monitoring --create-namespace

Step 5 — Verification (What You Should See in CubeAPM)

Events: You should see Deployment/ReplicaSet rollout events aligned to the start of throttle spikes (e.g., post-deploy traffic surge).
Metrics: You should see container-level throttle counters (throttled seconds/periods) rising for specific pods/containers, co-plotted with CPU usage vs. limits.
Logs: You should see application log slowdowns or timeouts around the same timestamps as throttle spikes (e.g., request exceeded deadline).
Restarts: You should see no abnormal restarts (throttling doesn’t crash pods)—confirming the issue is performance, not stability.
Rollout context: You should see a linked view where throttled pods belong to the same new ReplicaSet or to nodes with high utilization (noisy neighbor).
Trace correlation (if tracing enabled): You should see spans with elevated durations for throttled services, linked to the exact pod/container.

Example Alert Rules for Kubernetes CPU Throttling

1. High Throttle Rate on a Container

Why: Continuous throttling indicates limits or noisy neighbors. This alert fires when a container spends a significant portion of wall time throttled over 5 minutes.

Bash

sum by (namespace,pod,container) (rate(container_cpu_cfs_throttled_seconds_total{container!=""}[5m])) / sum by (namespace,pod,container) (rate(container_cpu_usage_seconds_total{container!=""}[5m])) > 0.2

2. Throttling Coincides with Latency Regression

Why: Ties user-facing pain to throttling so you can prioritize. Replace http_server_request_duration_seconds_bucket with your service histogram.

Bash

histogram_quantile(0.95, sum by (le,service) (rate(http_server_request_duration_seconds_bucket[5m]))) > 0.3 and sum by (service) (rate(container_cpu_cfs_throttled_seconds_total[5m])) > 5

3. Sudden Increase in Throttled Pods Across a Namespace

Why: Detects systemic misconfiguration (e.g., rollout with tight limits) instead of isolated cases.

Bash

count by (namespace) (rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0) > 5

4. Node-Level Throttle Pressure

Why: Identifies noisy neighbors or overcommit on a specific node. Helps SREs see infra-level hotspots.

Bash

sum by (node) (rate(container_cpu_cfs_throttled_seconds_total[5m])) > 30

5. Throttling + HPA Stuck

Why: Warns when throttling persists but autoscaler hasn’t added replicas (mis-scaling config).

Bash

(sum by (deployment) (rate(container_cpu_cfs_throttled_seconds_total[5m])) > 10) and (kube_hpa_status_desired_replicas == kube_hpa_status_current_replicas)

Conclusion

CPU throttling is a silent performance killer: pods keep “Running” while the Linux CFS enforces tight quotas, stretching response times and squeezing throughput. Most incidents trace back to aggressive limits, bursty workloads, or noisy neighbors.

The fastest wins come from right-sizing requests/limits, smoothing bursts with HPA/VPA, and validating node pressure. Observability closes the loop: correlate throttle metrics with rollout events, logs, and traces to pinpoint root cause.

CubeAPM makes this correlation first-class. By unifying Events, Metrics, Logs, and Rollouts, it shows exactly which pods are throttled, when it began, and why—so teams can fix misconfigurations before users feel the impact.

FAQ

It’s when the Linux kernel enforces CPU limits set on a container by pausing execution once the quota is consumed. Unlike memory limits, it doesn’t crash pods—just silently slows them down, causing hidden performance degradation.

The main metric is container_cpu_cfs_throttled_seconds_total. Use kubectl top pod for quick checks, or scrape cAdvisor metrics. In CubeAPM, you can overlay this metric with pod events and rollout history to see when throttling starts.

Limits apply per container, independent of node headroom. If a pod exceeds its configured quota, the kernel throttles it even if the node has spare cycles. This ensures fair scheduling but creates artificial slowdowns when limits are too strict.

Not always. Removing limits can prevent throttling but risks noisy-neighbor effects. A safer pattern is to set realistic requests for scheduling and adjust limits high enough for bursts. For latency-critical services, use Guaranteed QoS (requests = limits).

Basic tools like kubectl and Prometheus metrics show counters, but they lack context. CubeAPM correlates throttling metrics with Events, Logs, and Rollouts, so you can quickly identify whether it’s a rollout misconfig, a noisy neighbor, or an over-tight limit causing the slowdown.

Ready To Achieve 10X+ ROI?

Schedule a Demo with one of our media experts below.

Book a demo

Kubernetes CPU Throttling Error: Node Scheduling, Pod QoS Classes, and Telemetry with CubeAPM

What is Kubernetes CPU Throttling Error

Why Kubernetes CPU Throttling Happens

1. Overly Aggressive CPU Limits

2. Bursty or Latency-Sensitive Workloads

3. Noisy Neighbor Effects in Multi-Tenant Clusters

4. Misconfigured Requests vs. Limits

5. CFS Quota Period Defaults

6. Cluster Autoscaler Interactions

How to Fix Kubernetes CPU Throttling

1) Raise or Remove Over-Tight CPU Limits

2) Match Requests to Real Load (Right-Size)

3) Use Guaranteed QoS for Critical Latency Paths

4) Smooth Bursts with HPA/VPA (Avoid Spiky Hot Pods)

5) Reduce Noisy-Neighbor Pressure (Bin-Packing Guards)

6) Tune Runtime Flags to Lower CPU Spikes

7) Consider Disabling CFS Quota Carefully (Cluster-Wide)

8) Rebalance Hot Pods Across Nodes (Topology)

Monitoring Kubernetes CPU Throttling with CubeAPM

Step 1 — Install CubeAPM (Helm)

Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)

Step 3 — Collector Configs Focused on CPU Throttling

3.1 DaemonSet (node-level) — metrics & logs focused on throttling

3.2 Deployment (cluster-level) — events & rollout context

Step 4 — Supporting Components (optional but recommended)

Step 5 — Verification (What You Should See in CubeAPM)

Example Alert Rules for Kubernetes CPU Throttling

1. High Throttle Rate on a Container

2. Throttling Coincides with Latency Regression

3. Sudden Increase in Throttled Pods Across a Namespace

4. Node-Level Throttle Pressure

5. Throttling + HPA Stuck

Conclusion

FAQ

1. What is Kubernetes CPU throttling?

2. How can I check if my pods are being throttled?

3. Why does throttling happen even when nodes have free CPU?

4. Should I remove CPU limits to avoid throttling?

5. How do I monitor CPU throttling effectively?

Ready To Achieve 10X+ ROI?

Kubernetes CPU Throttling Error: Node Scheduling, Pod QoS Classes, and Telemetry with CubeAPM

What is Kubernetes CPU Throttling Error

Why Kubernetes CPU Throttling Happens

1. Overly Aggressive CPU Limits

2. Bursty or Latency-Sensitive Workloads

3. Noisy Neighbor Effects in Multi-Tenant Clusters

4. Misconfigured Requests vs. Limits

5. CFS Quota Period Defaults

6. Cluster Autoscaler Interactions

How to Fix Kubernetes CPU Throttling

1) Raise or Remove Over-Tight CPU Limits

2) Match Requests to Real Load (Right-Size)

3) Use Guaranteed QoS for Critical Latency Paths

4) Smooth Bursts with HPA/VPA (Avoid Spiky Hot Pods)

5) Reduce Noisy-Neighbor Pressure (Bin-Packing Guards)

6) Tune Runtime Flags to Lower CPU Spikes

7) Consider Disabling CFS Quota Carefully (Cluster-Wide)

8) Rebalance Hot Pods Across Nodes (Topology)

Monitoring Kubernetes CPU Throttling with CubeAPM

Step 1 — Install CubeAPM (Helm)

Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)

Step 3 — Collector Configs Focused on CPU Throttling

3.1 DaemonSet (node-level) — metrics & logs focused on throttling

3.2 Deployment (cluster-level) — events & rollout context

Step 4 — Supporting Components (optional but recommended)

Step 5 — Verification (What You Should See in CubeAPM)

Example Alert Rules for Kubernetes CPU Throttling

1. High Throttle Rate on a Container

2. Throttling Coincides with Latency Regression

3. Sudden Increase in Throttled Pods Across a Namespace

4. Node-Level Throttle Pressure

5. Throttling + HPA Stuck

Conclusion

FAQ

1. What is Kubernetes CPU throttling?

2. How can I check if my pods are being throttled?

3. Why does throttling happen even when nodes have free CPU?

4. Should I remove CPU limits to avoid throttling?

5. How do I monitor CPU throttling effectively?

Related Posts

Kubernetes Pod Resource Quota Exceeded Error: Namespace Limits, CPU Throttling & Workload Blocking

Kubernetes PID Pressure Error Explained: Node Evictions, PID Limits & Process ID Exhaustion

Kubernetes Disk Pressure Error Explained: Node Evictions, Root Causes, and Monitoring with CubeAPM

Ready To Achieve 10X+ ROI?