The 2024 State of Production Kubernetes report by Spectro Cloud found that over 71% of organizations hit performance bottlenecks tied to CPU limits and throttling. This issue occurs when Kubernetes enforces strict CPU quotas, causing workloads to stall even when nodes have idle capacity. For latency-sensitive services—APIs, trading systems, or SaaS apps—CPU throttling means higher tail latency, missed SLAs, and hidden costs from over-provisioning.
CubeAPM makes CPU throttling visible where other tools don’t. By correlating throttle metrics with rollout history, pod events, and container logs, it pinpoints which workloads are being throttled, why, and when. Teams can trace slow requests directly back to throttled pods—turning a silent performance killer into an actionable fix.
In this guide, we’ll cover what CPU throttling is, why it happens, how to fix it, and how to monitor it in real time with CubeAPM.
Table of Contents
ToggleWhat is Kubernetes CPU Throttling Error
Kubernetes CPU throttling happens when a container is allocated fewer CPU cycles than it requests, because of the CPU limits defined in the Pod spec. Once a container consumes its quota, the Linux kernel’s Completely Fair Scheduler (CFS) pauses execution until the next scheduling interval. This behavior ensures fair scheduling across workloads but can silently choke performance if limits are misconfigured.
Throttling also interacts poorly with bursty workloads. For example, a service might only spike CPU usage for a few milliseconds during request parsing or encryption. But if throttling kicks in during that spike, even short delays can snowball into tail latency issues across distributed systems. In multi-tenant clusters, one team’s aggressive CPU limits can cascade into degraded performance for shared workloads.
The impact is often seen in:
- Higher response latency: Requests take longer to complete as containers wait for CPU cycles.
- Reduced throughput: Services handle fewer requests per second under sustained throttling.
- Unstable performance: Latency spikes occur unpredictably, especially during traffic surges.
Why Kubernetes CPU Throttling Happens
1. Overly Aggressive CPU Limits
When developers set CPU limits too low relative to workload demand, Kubernetes strictly enforces them. The kernel scheduler cuts off execution cycles once the quota is consumed, causing the pod to stall even though node resources may still be available.
Quick check:
kubectl describe pod <pod-name> | grep -A5 "Limits"
If the limit value is smaller than observed usage, throttling is likely.
2. Bursty or Latency-Sensitive Workloads
Applications with short CPU spikes—such as encryption, JSON parsing, or request batching—often hit throttling during bursts. Even if average CPU usage is low, sudden peaks are capped, leading to delayed responses and tail-latency outliers.
Quick check:
kubectl top pod <pod-name>
Look for usage patterns where spikes exceed limits for short intervals.
3. Noisy Neighbor Effects in Multi-Tenant Clusters
In shared environments, one service consuming excess CPU can indirectly cause throttling for others, especially if limits are configured conservatively. This cross-workload contention leads to unpredictable slowdowns that don’t show up in traditional pod health checks.
Quick check:
kubectl top node
If node utilization is high while multiple pods report throttling, noisy neighbors are a probable cause.
4. Misconfigured Requests vs. Limits
A mismatch between requests and limits often triggers throttling. For example, if requests are low but limits are just slightly above, the scheduler may allocate insufficient CPU shares, leading to frequent pauses under real load.
Quick check:
kubectl describe pod <pod-name> | grep -A5 "Requests"
Compare requests with actual CPU usage to spot under-provisioned workloads.
5. CFS Quota Period Defaults
Kubernetes relies on the Linux CFS (Completely Fair Scheduler) to enforce CPU quotas. By default, quotas are applied over a 100ms period, meaning workloads can run freely until the quota is hit, after which they’re throttled until the next window. For CPU-bound apps, this introduces jitter and unpredictable stalls.
Quick check:
Inspect cgroup settings in /sys/fs/cgroup/cpu/ for cpu.cfs_quota_us and cpu.cfs_period_us values. If quotas are very restrictive, throttling will be frequent.
6. Cluster Autoscaler Interactions
When autoscaling is tuned only on CPU requests (not actual throttled usage), workloads may look “healthy” to the scheduler while still being throttled. This causes the cluster to under-scale, leaving pods throttled despite available nodes.
Quick check:
Review HorizontalPodAutoscaler (HPA) metrics:
kubectl get hpa
If scaling thresholds are based only on average CPU usage, throttling can persist unnoticed.
How to Fix Kubernetes CPU Throttling
Fixing throttling means validating resource configs and smoothing CPU bursts so workloads get the cycles they actually need—without blowing up cluster costs. Use the targeted checks and one-line fixes below.
1) Raise or Remove Over-Tight CPU Limits
If limits are too low, the kernel enforces quotas and pauses execution. Keep reasonable requests for scheduling, but avoid tiny limits that choke bursts.
Quick check:
kubectl describe pod <pod> | grep -A5 -E "Requests|Limits"
Fix (raise limits):
kubectl set resources deploy <deploy> --limits=cpu=1000m --requests=cpu=500m
Fix (remove limits, keep requests):
kubectl set resources deploy <deploy> --limits=cpu= --requests=cpu=500m
2) Match Requests to Real Load (Right-Size)
Low requests can starve scheduling shares and make throttling more likely under load spikes. Align requests with observed p95 CPU usage.
Quick check (top):
kubectl top pod <pod>
Fix (bump requests to observed steady state):
kubectl set resources deploy <deploy> --requests=cpu=700m
3) Use Guaranteed QoS for Critical Latency Paths
For ultra-sensitive services, setting requests == limits gives the pod stronger CPU guarantees and reduces throttle jitter.
Quick check:
kubectl describe pod <pod> | grep -A5 -E "Requests|Limits"
Fix (make Guaranteed):
kubectl set resources deploy <deploy> --requests=cpu=1000m --limits=cpu=1000m
4) Smooth Bursts with HPA/VPA (Avoid Spiky Hot Pods)
Bursty apps hit limits briefly and get throttled; scale them out/in to spread spikes across replicas.
Quick check (HPA present?):
kubectl get hpa
Fix (create HPA on CPU):
kubectl autoscale deploy <deploy> --cpu-percent=60 --min=3 --max=12
Fix (enable VPA in recommend/auto mode; apply your VPA manifest after installing VPA):
kubectl apply -f vpa-recommendation.yaml
5) Reduce Noisy-Neighbor Pressure (Bin-Packing Guards)
If nodes run hot, even modest limits throttle more often. Spread load or reserve CPU for critical pods.
Quick check (node pressure):
kubectl top node
Fix (anti-affinity rollout example—apply your spec):
kubectl rollout restart deploy <latency-critical-deploy>
Fix (priority & preemption—after adding a PriorityClass in the cluster):
kubectl patch deploy <deploy> -p '{"spec":{"template":{"spec":{"priorityClassName":"latency-critical"}}}}'
6) Tune Runtime Flags to Lower CPU Spikes
Inefficient GC or excessive worker threads cause micro-bursts that hit limits. Cap concurrency or adjust GC to flatten peaks.
Quick check (app logs around spikes):
kubectl logs <pod> -c <container> --since=10m
Fix (example env for Go to cap threads):
kubectl set env deploy <deploy> GOMAXPROCS=2 GODEBUG=gctrace=1
7) Consider Disabling CFS Quota Carefully (Cluster-Wide)
As a last resort for specific clusters, kubelet’s CPU CFS quota can be disabled, but this risks runaway CPU. Prefer app/limit tuning first.
Quick check (cluster policy—ask platform team):
kubectl -n kube-system get ds kubelet-config -oyaml | grep -i cfs
Fix (platform change—do not do casually):
Ask platform owners to set the kubelet flag –cpu-cfs-quota=false via the cluster’s node config mechanism and roll nodes safely.
8) Rebalance Hot Pods Across Nodes (Topology)
If a few nodes host most hot pods, they’ll throttle first. Spread replicas to even out CPU headroom.
Quick check (which nodes host throttled pods):
kubectl get pod -o wide | grep <deploy>
Fix (add preferred spread by hostname—apply updated spec):
kubectl rollout restart deploy <deploy>
Monitoring Kubernetes CPU Throttling with CubeAPM
Fastest path to root cause: CPU throttling is best diagnosed by correlating four streams in one place—Events, Metrics, Logs, and Rollouts. CubeAPM ties throttle metrics (container CFS quotas) to rollout history and pod events, so you can see which pods are throttled, when it started (post-deploy or traffic surge), and why (limits, noisy neighbors, burstiness). See the product docs for installs, config, and instrumentation: Install CubeAPM, Kubernetes setup, Configure, Instrumentation.
Step 1 — Install CubeAPM (Helm)
Use SaaS or BYOC. For BYOC, deploy via Helm with your values.yaml (endpoint, auth, storage).
Install (BYOC example; replace placeholders with values from the docs):
helm repo add cubeapm <CUBEAPM_HELM_REPO_URL> && helm repo update && helm install cubeapm <CUBEAPM_CHART_NAME> --namespace cubeapm-system --create-namespace --values values.yaml
Upgrade:
helm upgrade cubeapm <CUBEAPM_CHART_NAME> --namespace cubeapm-system --values values.yaml
Step 2 — Deploy the OpenTelemetry Collector (DaemonSet + Deployment)
- DaemonSet → runs on every node, scrapes node/pod metrics (incl. cAdvisor) and tails logs.
- Deployment → central pipeline that ingests events, enriches attributes, batches, and exports to CubeAPM.
DaemonSet (helm):
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts && helm repo update && helm install otel-ds open-telemetry/opentelemetry-collector --namespace observability --create-namespace --set mode=daemonset
Deployment (helm):
helm install otel-core open-telemetry/opentelemetry-collector --namespace observability --set mode=deployment
Step 3 — Collector Configs Focused on CPU Throttling
Below are minimal YAML snippets tailored to surface throttling. Apply through your Helm values or ConfigMap.
3.1 DaemonSet (node-level) — metrics & logs focused on throttling
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'kubelet-cadvisor'
scheme: https
metrics_path: /metrics/cadvisor
kubernetes_sd_configs:
- role: node
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
filelog:
include:
- /var/log/containers/*.log
start_at: end
processors:
k8sattributes:
extract:
metadata:
- k8s.pod.name
- k8s.container.name
- k8s.namespace.name
- k8s.node.name
resource:
attributes:
- key: telemetry.sdk.language
value: kubernetes
action: upsert
batch: {}
exporters:
otlp:
endpoint: ${CUBEAPM_OTLP_ENDPOINT}
tls:
insecure: false
headers:
authorization: ${CUBEAPM_OTLP_TOKEN}
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [k8sattributes, resource, batch]
exporters: [otlp]
logs:
receivers: [filelog]
processors: [k8sattributes, resource, batch]
exporters: [otlp]
- prometheus receiver (cadvisor): scrapes container metrics including CFS throttle counters (e.g., throttled seconds/periods) from kubelet’s cAdvisor endpoint.
- filelog receiver: tails container logs for error bursts that correlate with throttle spikes.
- k8sattributes/resource processors: add k8s labels (pod, container, node) for pinpointing which workloads are throttled.
- batch: efficient, back-pressure friendly export.
- otlp exporter: ships metrics/logs securely to CubeAPM.
3.2 Deployment (cluster-level) — events & rollout context
receivers:
otlp:
protocols:
grpc:
http:
k8s_events:
auth_type: serviceAccount
k8sobjects:
objects:
- name: deployments
mode: watch
- name: replicasets
mode: watch
- name: pods
mode: watch
processors:
k8sattributes:
attributes:
actions:
- key: workload.type
value: cpu-throttling
action: upsert
batch: {}
exporters:
otlp:
endpoint: ${CUBEAPM_OTLP_ENDPOINT}
headers:
authorization: ${CUBEAPM_OTLP_TOKEN}
service:
pipelines:
metrics:
receivers: [otlp]
processors: [k8sattributes, attributes, batch]
exporters: [otlp]
logs:
receivers: [otlp]
processors: [k8sattributes, attributes, batch]
exporters: [otlp]
traces:
receivers: [otlp]
processors: [k8sattributes, attributes, batch]
exporters: [otlp]
events:
receivers: [k8s_events]
processors: [k8sattributes, attributes, batch]
exporters: [otlp]
- k8s_events receiver: streams Kubernetes Events (e.g., scaling, scheduling) to line up “when throttling began” with cluster activity.
- k8sobjects receiver: watches Deployments/ReplicaSets/Pods so you can overlay rollout history on throttling timelines.
- otlp receiver: accepts telemetry from app/sidecar/daemonset pipelines.
- attributes processor: tags this pipeline for easy querying of cpu-throttling dashboards and alerts.
- otlp exporter: sends everything to CubeAPM for correlation across signals.
Configuration structure should follow your Helm chart’s values format; the above shows component intent. Map them under the chart’s config key per Configure.
Step 4 — Supporting Components (optional but recommended)
kube-state-metrics (for richer workload metadata):
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && helm repo update && helm install kube-state-metrics prometheus-community/kube-state-metrics --namespace monitoring --create-namespace
Step 5 — Verification (What You Should See in CubeAPM)
- Events: You should see Deployment/ReplicaSet rollout events aligned to the start of throttle spikes (e.g., post-deploy traffic surge).
- Metrics: You should see container-level throttle counters (throttled seconds/periods) rising for specific pods/containers, co-plotted with CPU usage vs. limits.
- Logs: You should see application log slowdowns or timeouts around the same timestamps as throttle spikes (e.g., request exceeded deadline).
- Restarts: You should see no abnormal restarts (throttling doesn’t crash pods)—confirming the issue is performance, not stability.
- Rollout context: You should see a linked view where throttled pods belong to the same new ReplicaSet or to nodes with high utilization (noisy neighbor).
- Trace correlation (if tracing enabled): You should see spans with elevated durations for throttled services, linked to the exact pod/container.
Example Alert Rules for Kubernetes CPU Throttling
1. High Throttle Rate on a Container
Why: Continuous throttling indicates limits or noisy neighbors. This alert fires when a container spends a significant portion of wall time throttled over 5 minutes.
sum by (namespace,pod,container) (rate(container_cpu_cfs_throttled_seconds_total{container!=""}[5m])) / sum by (namespace,pod,container) (rate(container_cpu_usage_seconds_total{container!=""}[5m])) > 0.2
2. Throttling Coincides with Latency Regression
Why: Ties user-facing pain to throttling so you can prioritize. Replace http_server_request_duration_seconds_bucket with your service histogram.
histogram_quantile(0.95, sum by (le,service) (rate(http_server_request_duration_seconds_bucket[5m]))) > 0.3 and sum by (service) (rate(container_cpu_cfs_throttled_seconds_total[5m])) > 5
3. Sudden Increase in Throttled Pods Across a Namespace
Why: Detects systemic misconfiguration (e.g., rollout with tight limits) instead of isolated cases.
count by (namespace) (rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0) > 5
4. Node-Level Throttle Pressure
Why: Identifies noisy neighbors or overcommit on a specific node. Helps SREs see infra-level hotspots.
sum by (node) (rate(container_cpu_cfs_throttled_seconds_total[5m])) > 30
5. Throttling + HPA Stuck
Why: Warns when throttling persists but autoscaler hasn’t added replicas (mis-scaling config).
(sum by (deployment) (rate(container_cpu_cfs_throttled_seconds_total[5m])) > 10) and (kube_hpa_status_desired_replicas == kube_hpa_status_current_replicas)
Conclusion
CPU throttling is a silent performance killer: pods keep “Running” while the Linux CFS enforces tight quotas, stretching response times and squeezing throughput. Most incidents trace back to aggressive limits, bursty workloads, or noisy neighbors.
The fastest wins come from right-sizing requests/limits, smoothing bursts with HPA/VPA, and validating node pressure. Observability closes the loop: correlate throttle metrics with rollout events, logs, and traces to pinpoint root cause.
CubeAPM makes this correlation first-class. By unifying Events, Metrics, Logs, and Rollouts, it shows exactly which pods are throttled, when it began, and why—so teams can fix misconfigurations before users feel the impact.