A Kubernetes cluster auto-scaling from 20 to 80 nodes during a traffic spike can triple your infrastructure bill in hours, but without the right CPU alerts, the first signal of trouble is often a customer complaint about slow page loads. CPU throttling, node pressure, and pod evictions happen silently until they cascade into user-facing outages. The gap between a pod hitting its CPU limit and an engineer noticing the problem can be minutes or hours, depending on how your alerts are configured.
This guide covers how to configure Kubernetes CPU alerts that catch problems early, which metrics to track, what thresholds actually work in production, and how to implement alerting rules in Prometheus, Grafana, and managed APM platforms.
What Is a Kubernetes CPU Alert?
A Kubernetes CPU alert is a notification triggered when CPU usage, throttling, or resource pressure crosses a defined threshold at the pod, node, or cluster level. These alerts help teams detect performance degradation, capacity bottlenecks, and resource contention before they impact application uptime or user experience.
Kubernetes exposes CPU data through several layers: container runtime metrics (via cAdvisor), node metrics (via kubelet), and cluster-level aggregates. A well-configured alert watches all three and fires based on context, whether that is a single pod being throttled 50% of the time or an entire node running at 95% CPU for 10 minutes.
The core problem Kubernetes CPU alerts solve is visibility into resource behavior in dynamic environments where pods scale, restart, and move across nodes constantly. Without alerts, you rely on users reporting slowness or SREs manually checking dashboards.
Why CPU Alerts Matter in Kubernetes
Kubernetes workloads behave differently from traditional VMs. Pods can be throttled by the kernel even when the underlying node has idle CPU capacity, because throttling enforces limits at the container level based on CFS (Completely Fair Scheduler) periods. A pod configured with a 200m CPU limit can be throttled even if the host has 8 idle cores.
Three specific problems make CPU alerting critical in Kubernetes:
CPU throttling degrades performance silently. A pod hitting its CPU limit gets throttled by the Linux kernel, adding latency to every request it processes. Users experience slow page loads, but the pod stays “healthy” from Kubernetes’ perspective, because it is not crashing or failing readiness checks. Without an alert tracking throttling metrics, the issue goes unnoticed until customers complain.
Node CPU pressure causes pod evictions. When a node runs out of allocatable CPU, Kubernetes evicts lower-priority pods to free resources. This triggers restarts, breaks active connections, and creates sudden traffic spikes on other replicas. A CPU alert at the node level catches pressure before eviction starts.
Cluster-wide CPU saturation compounds during traffic spikes. If your cluster is at 80% CPU during normal load and traffic doubles, autoscaling takes 2 to 5 minutes to provision new nodes. During that window, existing pods throttle, slow down, and sometimes OOMKill. Alerting on cluster-level CPU usage gives you early warning before capacity becomes a bottleneck.
A study from the CNCF Annual Survey 2024 found that 58% of organizations cite observability and monitoring as a top challenge in Kubernetes environments, with resource-based alerting being one of the most requested capabilities.
Key Kubernetes CPU Metrics to Monitor
Kubernetes exposes CPU data through multiple metrics, each serving a different alerting use case. Understanding which metric to alert on depends on whether you are tracking actual usage, throttling, limits, or node-level pressure.
container_cpu_usage_seconds_total
This counter tracks the total CPU time consumed by a container in seconds. It is cumulative, so you use rate() in Prometheus to calculate current CPU usage as a percentage or millicores.
Use case: Alert when a pod’s actual CPU usage approaches or exceeds its resource request or limit.
Why it matters: High sustained usage without throttling may indicate the pod needs a higher limit, or that the workload is CPU-bound and should be optimized.
container_cpu_cfs_throttled_seconds_total
This counter measures the total time a container was throttled by the CFS scheduler, meaning it wanted to run but was blocked because it hit its CPU limit.
Use case: Alert when throttling exceeds a percentage threshold over time, indicating performance degradation.
Why it matters: Throttling directly impacts application latency. A pod throttled 30% of the time is delivering 30% slower responses than it could if given more CPU.
container_cpu_cfs_throttled_periods_total
This counter tracks the number of CFS periods during which the container was throttled at least once. Each period is 100ms by default.
Use case: Calculate the percentage of periods where throttling occurred to assess how frequently the container is being limited.
Why it matters: Frequent throttling, even in short bursts, indicates the limit is too low for the workload’s behavior pattern.
node_cpu_seconds_total
This metric reports CPU usage at the node level across all cores and modes (user, system, idle, iowait). It is exported by the node exporter or kubelet.
Use case: Alert when node CPU usage exceeds 85–90% for a sustained period, signaling capacity risk.
Why it matters: High node CPU can trigger pod evictions, prevent new pods from scheduling, and degrade performance for all workloads on that node.
kube_pod_container_resource_requests and kube_pod_container_resource_limits
These metrics from kube-state-metrics report the requested and limit values set in pod specs.
Use case: Alert when actual usage approaches the limit, or when the request is set far below actual usage, causing scheduling inefficiency.
Why it matters: Mismatched requests and limits lead to either throttling or wasted capacity.
How CPU Throttling Works in Kubernetes
CPU throttling in Kubernetes is enforced by the Linux kernel’s CFS scheduler, not by Kubernetes itself. When you set a CPU limit in a pod spec, Kubernetes translates that into CFS quota and period values in the container’s cgroup.
By default, the CFS period is 100ms (100,000 microseconds). If you set a CPU limit of 200m (0.2 CPU cores), the quota is 20,000 microseconds per period. This means the container can run for up to 20ms out of every 100ms period.
If the container tries to use more CPU time than its quota within a period, the kernel throttles it, pausing execution until the next period starts. This happens even if the underlying node has idle CPU capacity.
Why this matters for alerting: A pod can be heavily throttled and delivering slow responses while the node it runs on is only at 30% CPU utilization. This is why alerting on pod-level throttling metrics is essential, you cannot rely on node-level CPU metrics alone to detect performance problems.
Two key throttling metrics are container_cpu_cfs_throttled_seconds_total and container_cpu_cfs_throttled_periods_total. Dividing throttled time by usage time gives the throttling ratio, which is often more useful than absolute throttle counts.
A throttling ratio above 25% typically indicates user-facing latency impact. Above 50%, the workload is being severely constrained.
Setting Up Kubernetes CPU Alerts in Prometheus
Prometheus is the most common tool for Kubernetes alerting because it natively scrapes kubelet and cAdvisor metrics. Here’s how to configure CPU alerts using PrometheusRule custom resources, which are applied to clusters running the Prometheus Operator.
Alert: Pod CPU usage near limit
This alert fires when a pod is using more than 90% of its configured CPU limit for 10 minutes.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: pod-cpu-alerts
namespace: monitoring
spec:
groups:
- name: pod-cpu
interval: 30s
rules:
- alert: PodCPUNearLimit
expr: |
(
sum by (namespace, pod, container) (
rate(container_cpu_usage_seconds_total{container!="", container!="POD"}[5m])
)
/
sum by (namespace, pod, container) (
kube_pod_container_resource_limits{resource="cpu"}
)
) > 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} CPU usage is {{ $value | humanizePercentage }} of limit"
description: "Namespace: {{ $labels.namespace }}, Container: {{ $labels.container }}"
This query divides actual CPU usage (from container_cpu_usage_seconds_total) by the configured limit. When the ratio exceeds 90% for 10 minutes, it fires.
Alert: High CPU throttling
This alert detects when a pod is being throttled more than 25% of the time over a 5-minute window.
- alert: HighCPUThrottling
expr: |
100 * (
sum by (namespace, pod, container) (
rate(container_cpu_cfs_throttled_periods_total{container!=""}[5m])
)
/
sum by (namespace, pod, container) (
rate(container_cpu_cfs_periods_total{container!=""}[5m])
)
) > 25
for: 15m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} throttled {{ $value | humanizePercentage }} of the time"
description: "Container {{ $labels.container }} in namespace {{ $labels.namespace }} is being throttled"
This calculates the percentage of CFS periods where throttling occurred. A threshold of 25% is a common starting point, adjust based on your workload’s latency tolerance.
Alert: Node CPU pressure
This alert fires when a node’s CPU usage exceeds 90% for 10 minutes.
- alert: NodeCPUPressure
expr: |
100 - (
avg by (instance) (
rate(node_cpu_seconds_total{mode="idle"}[5m])
) * 100
) > 90
for: 10m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} CPU usage is {{ $value | humanizePercentage }}"
description: "Sustained high CPU may cause pod evictions or scheduling failures"
This inverts the idle CPU metric to calculate used percentage. Firing after 10 minutes avoids noise from brief spikes.
Alert Thresholds That Work in Production
Choosing the right threshold for CPU alerts depends on your workload’s tolerance for latency and the consequences of throttling. Generic defaults often produce too much noise or miss real problems.
For throttling alerts: Start with 25% as a warning threshold. If a pod is throttled 25% of the time, it is delivering responses 25% slower than it could. For latency-sensitive workloads (APIs, checkout flows), lower this to 15%. For batch jobs, raise it to 50% or disable throttling alerts entirely.
For CPU usage near limit: Alert at 85–90% of the limit. This gives you time to raise the limit or scale replicas before throttling starts. Do not set it too low (like 70%), or you will get alerts on normal traffic patterns.
For node CPU pressure: Alert at 85% for 10 minutes. Kubernetes can handle brief spikes, but sustained high CPU leads to evictions and scheduling failures. In clusters with aggressive autoscaling, you may raise this to 90%.
Duration matters as much as threshold. A 5-second CPU spike to 100% is usually not actionable. Use for: 10m or for: 15m in your alert rules to filter transient load.
Adjust by namespace or workload priority. Production namespaces should have tighter thresholds than staging. Business-critical services should alert earlier than internal tools.
One pattern from the r/devops community: users recommend alerting on throttling percentage rather than absolute usage, because usage alone does not tell you whether performance is degraded.
Configuring CPU Alerts in CubeAPM
CubeAPM handles Kubernetes CPU alerting natively, with built-in support for pod-level, node-level, and cluster-level CPU metrics. Unlike Prometheus, where you write PromQL alert rules manually, CubeAPM provides a UI-driven alert builder that lets you configure CPU alerts in under 2 minutes.
Setting up a CPU throttling alert in CubeAPM
CubeAPM automatically collects container_cpu_cfs_throttled_seconds_total and other throttling metrics from your Kubernetes nodes via OpenTelemetry or Prometheus-compatible scraping.
To create a throttling alert:
- Navigate to Alerts → Create Alert in the CubeAPM dashboard
- Select Kubernetes Pod Metrics as the data source
- Choose CPU Throttling Percentage as the metric
- Set threshold: > 25% for 15 minutes
- Filter by namespace or deployment if needed
- Route to Slack, PagerDuty, or email
CubeAPM calculates throttling percentage automatically using the same formula as Prometheus, (throttled_periods / total_periods) * 100, but surfaces it as a pre-built metric you can select without writing queries.
Setting up a node CPU pressure alert
For node-level alerts:
- Alerts → Create Alert → Kubernetes Node Metrics
- Select Node CPU Usage Percentage
- Set threshold: > 90% for 10 minutes
- Add label filters if monitoring specific node pools
- Configure notification channels
CubeAPM correlates node CPU alerts with pod-level data automatically. When a node CPU alert fires, the alert detail view shows which pods on that node are consuming the most CPU, making root cause analysis faster.
Why CubeAPM simplifies Kubernetes CPU alerting
Unlike building custom Prometheus alert rules or configuring Grafana alerting, CubeAPM provides:
- Pre-built CPU alert templates for throttling, usage, and node pressure
- Automatic correlation between alerts and traces, showing which requests were slow during throttling
- Self-hosted deployment so all metrics and alert data stay inside your VPC
- No per-seat or per-host fees — $0.15/GB flat pricing regardless of alert volume
Teams using Kubernetes monitoring platforms for alerting often hit pricing surprises when alert volume scales with cluster growth. CubeAPM’s ingestion-based pricing avoids that problem entirely.
Common Mistakes in Kubernetes CPU Alerting
Setting CPU limits without monitoring throttling. Many teams set limits defensively to prevent runaway pods, but never alert on throttling. This creates silent performance degradation. If you set limits, you must monitor throttling.
Alerting on CPU requests instead of actual usage. Requests affect scheduling, not runtime behavior. A pod with a 500m request can use 2000m if no limit is set. Alert on actual usage and throttling, not requests.
Using the same threshold for all workloads. Batch jobs tolerate throttling. User-facing APIs do not. Apply different thresholds by namespace or workload type.
Ignoring CFS period behavior. Throttling happens in 100ms periods. A pod can be heavily throttled in aggregate but show normal usage in any single 10-second scrape interval. Always use rate() over 5m windows, not instantaneous values.
Not correlating CPU alerts with application metrics. A CPU spike means nothing without context. Was it caused by a traffic spike, a slow database, or a memory leak triggering GC thrashing? Link CPU alerts to APM traces and logs.
How CPU Alerts Fit into Broader Kubernetes Observability
CPU alerts are one dimension of Kubernetes observability. To diagnose the root cause of a CPU spike, you need correlated data from logs, traces, and infrastructure metrics.
When a CPU throttling alert fires, the next questions are always: Which service or endpoint triggered the spike? Was it a database query, an external API call, or application code? Did memory pressure contribute? Did the pod restart afterward?
Answering these requires linking the throttling event to distributed traces showing the request flow, logs showing error patterns, and memory metrics showing if OOMKill followed.
Platforms that unify these signals, APM, logs, infrastructure, and Kubernetes metrics, reduce mean time to resolution significantly compared to stitching together Prometheus, Grafana, and separate logging tools. Teams evaluating modern Datadog alternatives or New Relic alternatives often prioritize this unified view as a core requirement.
Best Practices for Kubernetes CPU Alerts
Use tiered severity levels. Configure warning alerts at 25% throttling or 85% node CPU, and critical alerts at 50% throttling or 95% node CPU. This helps teams prioritize response.
Alert on trends, not spikes. Use for: 10m or for: 15m in Prometheus rules to filter transient load. Short bursts are normal in Kubernetes.
Correlate with autoscaler activity. If HPA is scaling replicas or cluster autoscaler is adding nodes, suppress CPU alerts temporarily to avoid noise during known scaling events.
Monitor the ratio of requests to limits. A pod with a 100m request and 2000m limit is at risk of severe throttling under load. Alert when limit / request > 10.
Track throttling history over time. Persistent throttling indicates the limit is structurally too low. Use long-term retention to identify patterns across deployments.
Automate remediation where possible. For non-critical workloads, consider automatically raising CPU limits when throttling persists, or triggering HPA scale-out based on throttling metrics.
A case study from RedBus documented 50% faster MTTR after implementing correlated CPU and APM alerts, compared to relying on infrastructure metrics alone.
Conclusion
Kubernetes CPU alerts are essential for detecting throttling, node pressure, and capacity bottlenecks before they degrade user experience. The difference between a well-configured alert system and a reactive one is often whether problems are caught in minutes or hours.
The key steps: monitor pod-level throttling metrics, alert on node CPU pressure before evictions start, set thresholds based on workload tolerance, and correlate CPU events with application traces and logs. Tools like Prometheus and Grafana provide the foundation, but platforms like CubeAPM simplify the workflow by unifying metrics, traces, and alerts in one self-hosted system with predictable pricing.
If you are building Kubernetes CPU alerting from scratch, start with the three core alerts covered in this guide: pod CPU near limit, high throttling percentage, and node CPU pressure. Refine thresholds based on real incident data, not guesses.
Disclaimer: The information in this article reflects the latest details available at the time of publication and may change as technologies and products evolve. Features, pricing, and plan limits can change over time. Always verify the latest information directly with the vendor before making purchasing or deployment decisions.
Frequently Asked Questions
What is a good CPU alert threshold for Kubernetes pods?
Alert at 85–90% of the CPU limit for usage, and 25% throttling over 15 minutes for latency-sensitive workloads. Adjust based on your workload’s tolerance.
How do I alert on CPU throttling in Kubernetes?
Use the `container_cpu_cfs_throttled_periods_total` metric divided by `container_cpu_cfs_periods_total` to calculate throttling percentage. Alert when it exceeds 25% for 15 minutes.
What is the difference between CPU requests and limits in Kubernetes?
Requests determine pod scheduling and guaranteed CPU share. Limits cap maximum CPU usage and trigger throttling when exceeded. Alert on limits, not requests.
Why is my Kubernetes pod being throttled even when the node has idle CPU?
Throttling is enforced at the pod level by CFS quotas, not by node-level availability. A pod can be throttled if it exceeds its limit, regardless of node capacity.
What is CPU throttling in Kubernetes?
CPU throttling occurs when a container exceeds its configured CPU limit and the Linux CFS scheduler pauses it until the next 100ms period starts, adding latency to all requests.
Should I set CPU limits on all Kubernetes pods?
Setting limits prevents runaway resource usage but risks throttling. For user-facing workloads, set limits only if you monitor throttling. For batch jobs, limits are safer.
How do I troubleshoot high CPU usage in Kubernetes?
Check pod-level CPU metrics, correlate with application traces to find slow endpoints, review recent deployments, and check for memory pressure or database bottlenecks triggering CPU spikes.





