High CPU usage can silently degrade application performance, trigger scaling failures, or cause service timeouts before anyone notices. Without proactive CPU alerts, teams discover these problems only after users report slow response times or systems crash under load. CPU alerts detect resource saturation early, giving teams time to investigate, optimize, or scale infrastructure before impact becomes visible.
This guide walks through setting up CPU alerts across AWS CloudWatch, Prometheus, Grafana, Kubernetes environments, and on-prem infrastructure like CubeAPM. Each step includes real configuration examples, threshold recommendations based on workload type, and troubleshooting advice for common false positive scenarios.
Prerequisites
Before setting up CPU alerts, ensure you have:
- Monitoring infrastructure deployed and collecting CPU metrics (CloudWatch, Prometheus, cAdvisor, Telegraf, or equivalent)
- Admin or operator access to your monitoring platform and alerting tools
- Notification channels configured (Slack, PagerDuty, email, webhooks)
- Basic familiarity with your monitoring query language (PromQL for Prometheus, CloudWatch metric filters, or platform-specific query syntax)
- Historical CPU baseline data for at least 7 days to understand normal usage patterns and avoid threshold misconfigurations
Step 1: Understand Your Workload and Set the Right Threshold
CPU alert thresholds depend entirely on workload type. A batch processing server hitting 95% CPU during scheduled jobs is normal. A frontend API server hitting 80% CPU during regular traffic is a warning sign.
Workload-specific threshold recommendations
Web servers and API endpoints: Alert at 70–75% sustained CPU. Brief spikes to 90% during traffic bursts are normal, but sustained levels above 70% indicate capacity problems or inefficient code.
Database servers: Alert at 60–70% CPU. Databases hitting sustained CPU pressure often show query latency spikes and connection pool exhaustion shortly after.
Batch processing and data pipelines: Alert at 85–90% CPU only if sustained for longer than expected job duration. Short-lived high CPU is expected during processing windows.
Kubernetes pods and containers: Alert at 80% of the defined CPU limit, not raw host CPU. If a pod has a 2-core limit and is using 1.6 cores, that is 80% utilization and warrants investigation.
Define evaluation period and consecutive breaches
CPU can spike briefly during auto-scaling events, deployments, or cache rebuilds. Alerting on a single data point above threshold creates noise.
Set evaluation rules like:
- CloudWatch: “2 out of 3 evaluation periods” means CPU must breach threshold in 2 consecutive 5-minute windows before alerting
- Prometheus:
for: 5min alert rules means the condition must be true continuously for 5 minutes before firing - Grafana: Define alert condition as “When avg() of query is above X for 5 minutes”
Step 2: Set Up CPU Alerts in AWS CloudWatch
AWS CloudWatch monitors EC2 instances, ECS tasks, Lambda functions, and RDS databases. CPU metrics appear automatically once CloudWatch agent or default EC2 monitoring is enabled.
Create a CloudWatch alarm for EC2 CPU utilization
Navigate to the CloudWatch console at https://console.aws.amazon.com/cloudwatch/ and select Alarms > All Alarms > Create Alarm.
Choose Select metric > EC2 > Per-Instance Metrics. Find the instance ID you want to monitor and select the CPUUtilization metric.
Under Specify metric and conditions:
- Statistic: Choose
Averageto smooth out short spikes, orMaximumto catch peak usage - Period: Set to
5 minutesfor most workloads - Threshold type: Select
Static - Condition: Choose
Greater thanand enter your threshold (e.g.,75)
Under Additional configuration:
- Datapoints to alarm: Set
2 out of 3to require two consecutive breaches before alerting - Missing data treatment: Choose
Treat missing data as missingto avoid false alerts during instance stops
Choose Next and configure SNS topic for notifications. If you do not have an SNS topic, create one and subscribe your email or Slack webhook endpoint.
Complete the alarm setup and save.
Example CloudWatch CLI command
aws cloudwatch put-metric-alarm \
--alarm-name high-cpu-web-server \
--alarm-description "Alert when CPU exceeds 75% for 10 minutes" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--evaluation-periods 2 \
--threshold 75 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=InstanceId,Value=i-0123456789abcdef0 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:high-cpu-alerts
This creates an alarm that triggers when average CPU exceeds 75% for two consecutive 5-minute periods.
Pricing based on publicly available information as of April 2026. AWS alarm pricing is $0.10 per alarm per month for standard metrics. Verify current rates at the [AWS CloudWatch pricing page](https://aws.amazon.com/cloudwatch/pricing/).
Step 3: Set Up CPU Alerts in Prometheus
Prometheus collects CPU metrics from node_exporter for hosts, cAdvisor for containers, and kube-state-metrics for Kubernetes workloads. Alert rules are defined in YAML files and evaluated by Prometheus server.
Create a Prometheus alert rule for host CPU
Create or edit your Prometheus alert rules file (typically /etc/prometheus/alert_rules.yml):
groups:
- name: cpu_alerts
interval: 30s
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 75
for: 5m
labels:
severity: warning
component: infrastructure
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is {{ $value }}% on instance {{ $labels.instance }} for more than 5 minutes."
This rule calculates CPU usage by subtracting idle time from 100%. The for: 5m clause ensures the alert fires only after CPU remains above 75% continuously for 5 minutes.
Reload Prometheus configuration:
curl -X POST http://localhost:9090/-/reload
Verify the alert rule appears in the Prometheus UI under Alerts.
Forward Prometheus alerts to Alertmanager
Prometheus does not send notifications directly. Configure Alertmanager to route alerts to Slack, PagerDuty, or email.
Edit /etc/alertmanager/alertmanager.yml:
route:
receiver: 'slack-notifications'
group_by: ['alertname', 'instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receivers:
- name: 'slack-notifications'
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
channel: '#alerts'
title: 'CPU Alert'
text: '{{ range .Alerts }}{{ .Annotations.summary }}\n{{ .Annotations.description }}{{ end }}'
Restart Alertmanager to apply changes.
Step 4: Set Up CPU Alerts in Grafana
Grafana supports alerting directly from dashboard panels when connected to Prometheus, CloudWatch, or other data sources.
Create a Grafana alert from a CPU panel
Open a Grafana dashboard with a CPU utilization panel. Edit the panel and navigate to the Alert tab.
Define the alert condition:
- Query: Use the same PromQL or CloudWatch query as the panel
- Condition:
WHEN avg() OF query(A, 5m) IS ABOVE 75 - Evaluate every:
1m - For:
5m
This configuration checks CPU every minute and fires the alert only after CPU exceeds 75% continuously for 5 minutes.
Under Notifications, select your notification channel (Slack, PagerDuty, email). If you have not configured a channel, go to Alerting > Notification channels and create one.
Save the dashboard. The alert is now active.
Example Grafana alert rule in JSON
{
"alert": {
"conditions": [
{
"evaluator": {
"params": [75],
"type": "gt"
},
"operator": {
"type": "and"
},
"query": {
"params": ["A", "5m", "now"]
},
"reducer": {
"params": [],
"type": "avg"
},
"type": "query"
}
],
"executionErrorState": "keep_state",
"for": "5m",
"frequency": "1m",
"handler": 1,
"name": "High CPU Alert",
"noDataState": "no_data",
"notifications": []
}
}
Step 5: Set Up CPU Alerts for Kubernetes Pods
Kubernetes environments require monitoring both node-level CPU and pod-level CPU relative to resource limits. A pod using 100% of its CPU limit may only be using 10% of the node’s total CPU capacity.
Monitor pod CPU with Prometheus and kube-state-metrics
Deploy kube-state-metrics in your cluster to expose pod resource metrics:
kubectl apply -f https://github.com/kubernetes/kube-state-metrics/releases/download/v2.10.0/kube-state-metrics.yaml
Create a Prometheus alert rule for pod CPU usage:
groups:
- name: kubernetes_cpu_alerts
interval: 30s
rules:
- alert: PodCPUThrottling
expr: rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is being CPU throttled"
description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is experiencing CPU throttling."
- alert: PodHighCPUUsage
expr: (sum(rate(container_cpu_usage_seconds_total[5m])) by (pod, namespace) / sum(container_spec_cpu_quota / container_spec_cpu_period) by (pod, namespace)) * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} CPU usage is high"
description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is using {{ $value }}% of its CPU limit."
The first rule detects CPU throttling, which happens when a pod hits its CPU limit and Kubernetes restricts its CPU cycles. The second rule alerts when a pod uses more than 80% of its defined CPU limit.
Set CPU alerts for Kubernetes nodes
Monitor node CPU pressure to detect capacity problems before pods are evicted:
- alert: NodeCPUPressure
expr: kube_node_status_condition{condition="CPUPressure",status="true"} == 1
for: 5m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.node }} is under CPU pressure"
description: "Kubernetes node {{ $labels.node }} is reporting CPU pressure. Pods may be evicted."
Step 6: Set Up CPU Alerts with CubeAPM
CubeAPM provides infrastructure monitoring with native support for host, container, and Kubernetes CPU metrics. It runs inside your VPC or on-prem, keeping all telemetry data local while delivering managed observability.
Configure CPU alerts in CubeAPM
CubeAPM automatically collects CPU metrics from OpenTelemetry, Prometheus exporters, and native Kubernetes integrations once agents are deployed.
Navigate to Alerts > Create Alert in the CubeAPM dashboard.
Select Infrastructure as the alert type and define the condition:
- Metric:
host.cpu.utilizationork8s.pod.cpu.utilization - Aggregation:
avg - Threshold:
> 75 - Evaluation window:
5 minutes - Consecutive breaches:
2
Add notification channels (Slack, PagerDuty, email, webhook) and save the alert.
CubeAPM correlates CPU alerts with application traces, logs, and deployment events, giving full context when an alert fires. If CPU spikes correlate with a recent deployment or slow database query, CubeAPM surfaces that connection automatically.
Why CubeAPM simplifies CPU alerting
Unlike Prometheus or Grafana where you build and maintain alert rules manually, CubeAPM provides pre-configured alert templates for common scenarios including CPU, memory, disk, and network thresholds. Alerts auto-populate with contextual data like affected pods, services, and recent changes.
CubeAPM runs on your infrastructure with no data egress, making it suitable for regulated environments where telemetry cannot leave the VPC. Pricing is $0.15/GB of ingested data with unlimited retention and no per-host or per-user fees.
For teams running AWS Lambda monitoring, CubeAPM also tracks Lambda invocation CPU time and memory usage alongside traditional infrastructure metrics.
Step 7: Configure Notification Channels
CPU alerts are only useful if the right people see them in time. Configure notification channels that integrate with your team’s existing workflow.
Slack integration
Most monitoring platforms support Slack webhooks. Create an incoming webhook in your Slack workspace under Apps > Incoming Webhooks.
Copy the webhook URL and paste it into your monitoring platform’s notification settings. Test the integration to ensure alerts appear in the correct channel.
PagerDuty integration
For on-call rotations, integrate with PagerDuty to route critical CPU alerts to the engineer on duty.
Create a PagerDuty integration key for your monitoring platform under Services > Service Directory > New Service. Choose the integration type (Prometheus, CloudWatch, Grafana, or CubeAPM).
Add the integration key to your alerting platform under notification channels.
Email alerts
Email remains the most universal notification method. Configure SMTP settings in your monitoring platform and create an email notification channel.
Set different severity levels to different email addresses. Warning-level CPU alerts can go to a monitoring alias. Critical alerts can page on-call engineers directly.
Webhook for custom workflows
Use webhooks to route alerts to custom dashboards, ticketing systems, or automation workflows. Webhook payloads include alert metadata like instance ID, metric value, timestamp, and severity level.
Example webhook payload from Prometheus Alertmanager:
{
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "HighCPUUsage",
"instance": "web-server-01",
"severity": "warning"
},
"annotations": {
"summary": "High CPU usage on web-server-01",
"description": "CPU usage is 82% on instance web-server-01 for more than 5 minutes."
},
"startsAt": "2026-04-15T10:32:00Z"
}
]
}
Troubleshooting Common Issues
False positives from auto-scaling events
Auto-scaling triggers brief CPU spikes as new instances warm up or existing instances drain connections. These spikes resolve within minutes and should not trigger alerts.
Solution: Set evaluation periods to require sustained threshold breaches. Use for: 5m in Prometheus or 2 out of 3 datapoints in CloudWatch to filter out short-lived spikes.
CPU alerts firing during known batch jobs
Scheduled batch processing jobs intentionally drive CPU to high levels during execution windows. Alerting during these periods creates noise.
Solution: Use alert suppression windows or maintenance mode during scheduled jobs. In Prometheus Alertmanager, define inhibition rules that silence CPU alerts when a job_running metric is active.
Missing data causing alert state changes
If the monitoring agent stops reporting metrics, some platforms treat missing data as a breach and fire alerts. This creates false positives during network interruptions or agent restarts.
Solution: Configure missing data treatment explicitly. In CloudWatch, choose “Treat missing data as missing” instead of “Treat missing data as breaching.” In Prometheus, use absent() queries to detect metric disappearance separately from threshold breaches.
CPU throttling in containers not triggering alerts
Containers hitting CPU limits experience throttling without necessarily maxing out host CPU. Standard CPU utilization metrics miss this condition.
Solution: Monitor container_cpu_cfs_throttled_seconds_total in Kubernetes environments. This metric increments whenever a container is throttled due to hitting its CPU quota. Alert when the rate of throttling exceeds acceptable levels.
Alert fatigue from overly sensitive thresholds
Setting CPU alert thresholds too low generates constant alerts during normal traffic patterns, leading teams to ignore or mute them.
Solution: Review historical CPU patterns over at least 7 days before setting thresholds. Use percentile-based thresholds (e.g., alert when CPU exceeds the 95th percentile of the past 30 days) to account for normal variability. Adjust thresholds after the first week of alerts to reduce noise.
Conclusion
CPU alerts protect application performance by detecting resource saturation before it causes user-facing impact. The threshold, evaluation period, and notification routing must match your workload type and team structure. A web API server needs different alert rules than a batch processing pipeline.
Start with conservative thresholds, monitor alert frequency for the first week, and adjust based on false positive rate. Tools like CubeAPM simplify this process by correlating CPU spikes with deployments, slow queries, and pod events automatically, reducing the time spent investigating whether an alert requires action.
Disclaimer: The information in this article reflects the latest details available at the time of publication and may change as technologies and products evolve. Features, pricing, and plan limits can change over time. Always verify the latest information directly with the vendor before making purchasing or deployment decisions.
Frequently Asked Questions
What is a good CPU threshold for alerts?
For web servers and APIs, alert at 70–75% sustained CPU. For batch processing, alert at 85–90% only if sustained beyond expected job duration. Thresholds depend entirely on workload type and historical patterns.
How do I avoid false positive CPU alerts?
Set evaluation periods to require sustained threshold breaches across multiple consecutive data points. Use `for: 5m` in Prometheus or `2 out of 3 datapoints` in CloudWatch to filter out brief spikes from auto-scaling or cache rebuilds.
Should I monitor CPU utilization or CPU throttling?
Monitor both. CPU utilization shows overall resource consumption. CPU throttling in containers shows when workloads hit their CPU limit and are being restricted by the scheduler, which can degrade performance even if host CPU appears low.
What is the difference between average and maximum CPU in alerts?
Average smooths out short spikes and reflects sustained load. Maximum catches peak usage during brief bursts. Use average for most workloads and maximum only when brief spikes cause user-facing impact.
How do I set CPU alerts for Kubernetes pods?
Monitor pod CPU relative to its defined limit, not host CPU. Alert when a pod uses more than 80% of its CPU limit or when CPU throttling rate exceeds acceptable levels. Use kube-state-metrics and cAdvisor for accurate pod-level metrics.
Can I set dynamic CPU thresholds based on traffic patterns?
Yes, some platforms support anomaly detection or percentile-based thresholds. Prometheus can calculate historical percentiles with `quantile_over_time()`. CubeAPM includes built-in anomaly detection that learns normal CPU patterns and alerts on deviations.
What should I do when a CPU alert fires?
Check recent deployments, database query performance, and traffic patterns. Correlate CPU spikes with application traces and logs to identify root cause. Scale infrastructure if CPU pressure is driven by legitimate load growth, or optimize code if caused by inefficient queries or memory leaks.





