CubeAPM
CubeAPM CubeAPM

What is kube-state-metrics and how do I use it on GKE? 

What is kube-state-metrics and how do I use it on GKE? 

Table of Contents

Running applications reliably on Google Kubernetes Engine (GKE) requires more than just deploying pods. You need continuous visibility into the health of your cluster: whether deployments have the right number of replicas running, whether pods are stuck in a Pending state, or whether nodes are under memory pressure. This is exactly what GKE kube state metrics is designed for.

kube-state-metrics (KSM) is a CNCF-aligned, open-source add-on that converts Kubernetes API object information into a structured stream of Prometheus-compatible metrics. Instead of measuring how much CPU a pod is consuming right now, it tells you what state that pod is in, what labels it carries, and how many times it has restarted. On GKE specifically, Google provides a managed integration that removes most of the operational overhead.

This guide explains what kube-state-metrics is, how it works, and how to configure and use it on GKE, whether through Google’s managed package or a self-managed Helm deployment.

Key Takeaways
  1. kube-state-metrics listens to the Kubernetes API server and exposes Prometheus-format metrics about Pods, Deployments, Nodes, and StatefulSets.
  2. On GKE, use the built-in Workloads State managed package. No manual installation needed.
  3. For custom setups, install via Helm and scrape with your own Prometheus instance.
  4. KSM tracks object state, not resource usage. Use Metrics Server for live CPU and memory consumption.
  5. Four metrics to start with: kube_pod_status_phase, kube_deployment_status_replicas_unavailable, kube_node_status_condition, and kube_pod_container_status_restarts_total.

What is kube-state-metrics?

kube-state-metrics is an add-on agent maintained under the official kubernetes/kube-state-metrics GitHub repository. It runs as a Deployment inside your cluster, watches the Kubernetes API server, and continuously generates metrics that describe the current state of every Kubernetes object in the cluster.

According to the official Kubernetes documentation, the purpose of kube-state-metrics is to expose metrics about the state of Kubernetes objects, not metrics about the Kubernetes components themselves. This distinction is important: KSM is concerned with whether a Deployment has the replicas it asked for, not with how much memory the kube-apiserver is using.

KSM holds a full in-memory snapshot of the cluster state and regenerates metrics on every scrape. The metrics are served at the /metrics endpoint on port 8080. Because they are generated directly from Kubernetes API objects without heuristic modification, they are considered a reliable source of ground truth for automation and alerting.

How kube-state-metrics Collects Data

KSM uses the standard Kubernetes Go client to watch resources via the Kubernetes API server. It maintains a local cache (called an informer) for each tracked resource type. When an object changes, the informer triggers an update, and the next metrics scrape reflects the new state. This watch-and-cache model is efficient: KSM does not poll the API server on every scrape request.

What Kubernetes Objects Does It Track?

As of recent versions, kube-state-metrics tracks more than 30 Kubernetes object types, including:

  • Pods and Pod conditions
  • Deployments, ReplicaSets, and ReplicationControllers
  • StatefulSets and DaemonSets
  • Jobs and CronJobs
  • Nodes and Node conditions
  • PersistentVolumes and PersistentVolumeClaims
  • Services and Endpoints
  • Namespaces and ResourceQuotas
  • HorizontalPodAutoscalers (HPA)
  • ConfigMaps, Secrets, and LimitRanges

kube-state-metrics vs. Metrics Server: Key Differences

A common point of confusion is the difference between kube-state-metrics and Metrics Server. They serve fundamentally different purposes and are complementary rather than competing.

kube-state-metricsMetrics Server
Tracks the state of Kubernetes objects (desired vs. actual replicas, pod phase, restart counts)Tracks live resource consumption (CPU and memory usage per pod and node)
Used for alerting, dashboards, and health monitoringUsed for Horizontal Pod Autoscaling (HPA) and kubectl top
Scrape-based; metrics are generated from API object stateAggregates real-time resource metrics from kubelets
Does not measure CPU or memory usageDoes not track pod phases, replica counts, or labels
Requires a scraper like Prometheus to collect its outputExposes data via the Kubernetes Metrics API

In practice, a well-instrumented GKE cluster uses both. kube-state-metrics tells you whether your workloads are healthy at the Kubernetes object level. Metrics Server (or a Prometheus node exporter) tells you whether the nodes and pods have the resources they need.

kube-state-metrics on GKE: The Managed Package

GKE offers a built-in managed integration for kube-state-metrics through its Workloads State package. According to Google Cloud documentation, this package is part of the Google Cloud Managed Service for Prometheus and can be enabled directly from the GKE console or via the gcloud CLI.

When the managed package is enabled, GKE automatically deploys a kube-state-metrics instance, configures scraping through Google Cloud Managed Service for Prometheus, and routes the metrics to Google Cloud Monitoring. You can then query them using PromQL in the Metrics Explorer or build dashboards in Cloud Monitoring.

Enabling the Managed Workloads State Package via Console

  1. Open the Google Cloud Console and navigate to Kubernetes Engine > Clusters.
  2. Click the name of the cluster you want to configure.
  3. Select the Observability tab.
  4. Locate the Workloads State package.
  5. Click Enable package.
  6. Metrics will begin flowing to Cloud Monitoring within a few minutes.

Enabling via gcloud CLI

You can also enable Managed Service for Prometheus (which includes kube-state-metrics collection) when creating or updating a GKE cluster:

# Enable when creating a new clustergcloud container clusters create my-cluster \  --zone us-central1-a \  --enable-managed-prometheus
# Enable on an existing clustergcloud container clusters update my-cluster \  --zone us-central1-a \  --enable-managed-prometheus

Installing kube-state-metrics on GKE via Helm

If you need custom configuration, want to use a different Prometheus stack, or prefer to manage the lifecycle yourself, you can install kube-state-metrics using the official Helm chart from the prometheus-community repository.

Prerequisites

  • A GKE cluster (Standard or Autopilot) with kubectl configured
  • Helm 3 installed on your local machine
  • A running Prometheus instance (or the kube-prometheus-stack Helm chart)

Step 1: Add the Prometheus Community Helm Repository

helm repo add prometheus-community  https://prometheus-community.github.io/helm-chartshelm repo update

Step 2: Install kube-state-metrics

helm install kube-state-metrics  prometheus-community/kube-state-metrics  --namespace monitoring  --create-namespace

This deploys kube-state-metrics as a Deployment in the monitoring namespace and creates a ClusterRole that gives it read access to all the Kubernetes API objects it needs to watch.

Step 3: Verify the Deployment

kubectl get pods -n monitoring# You should see a pod named kube-state-metrics-* in Running state
kubectl get svc -n monitoring# The Service exposes port 8080 for metrics scraping

Step 4: Access the Metrics Endpoint

kubectl port-forward svc/kube-state-metrics 8080:8080 -n monitoring# Then in another terminal:curl http://localhost:8080/metrics | head -50

Step 5: Configure Prometheus to Scrape kube-state-metrics

If you are using the Prometheus Operator, a ServiceMonitor resource is the recommended approach:

apiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata:  name: kube-state-metrics  namespace: monitoringspec:  selector:    matchLabels:      app.kubernetes.io/name: kube-state-metrics  endpoints:  - port: http    interval: 30s

If you are using a standalone Prometheus deployment with a static configuration file, add a scrape job:

scrape_configs:  - job_name: kube-state-metrics    static_configs:      - targets: ['kube-state-metrics.monitoring.svc.cluster.local:8080']

Key kube-state-metrics Metrics to Monitor on GKE

kube-state-metrics exposes hundreds of metrics. The following are the most actionable for day-to-day GKE operations.

Pod Health

MetricWhat It Tells You
kube_pod_status_phaseWhether pods are Running, Pending, Failed, Succeeded, or Unknown
kube_pod_container_status_restarts_totalTotal restart count per container; rising values indicate instability
kube_pod_container_status_waiting_reasonWhy a container is waiting (e.g., CrashLoopBackOff, ImagePullBackOff)
kube_pod_container_status_terminated_reasonWhy a container terminated (e.g., OOMKilled, Error)
kube_pod_container_resource_requestsCPU and memory requested by each container
kube_pod_container_resource_limitsCPU and memory limits set on each container

Deployment and Replica Health

MetricWhat It Tells You
kube_deployment_spec_replicasHow many replicas the Deployment spec requested
kube_deployment_status_replicas_availableHow many replicas are currently available
kube_deployment_status_replicas_unavailableHow many replicas are unavailable; a non-zero value means degraded state
kube_replicaset_status_ready_replicasReady replicas in a ReplicaSet
kube_statefulset_status_replicas_readyReady replicas in a StatefulSet

Node Health

MetricWhat It Tells You
kube_node_status_conditionNode conditions: Ready, MemoryPressure, DiskPressure, NetworkUnavailable
kube_node_infoNode metadata: kernel version, OS, container runtime
kube_node_status_allocatableAllocatable CPU, memory, and pods on each node
kube_node_spec_unschedulableWhether a node has been cordoned

HorizontalPodAutoscaler (HPA)

MetricWhat It Tells You
kube_hpa_status_current_replicasCurrent replica count tracked by the HPA
kube_hpa_status_desired_replicasDesired replica count computed by the HPA
kube_hpa_spec_max_replicasMaximum replicas configured on the HPA
kube_hpa_status_conditionWhether the HPA is able to scale

Useful PromQL Queries for GKE kube-state-metrics

Once kube-state-metrics is scraped by Prometheus, you can use PromQL to build dashboards and alerts. The following queries are practical starting points.

Count Pods by Phase

count by (phase) (kube_pod_status_phase{namespace="production"})

Detect Deployments with Unavailable Replicas

kube_deployment_status_replicas_unavailable > 0

Find Pods in CrashLoopBackOff

kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"} == 1

Alert on Nodes with Memory Pressure

kube_node_status_condition{condition="MemoryPressure",status="true"} == 1

Check HPA Scaling at Maximum

kube_hpa_status_current_replicas == kube_hpa_spec_max_replicas

Detect OOMKilled Containers in the Last Hour

sum by (pod, container, namespace) (  changes(kube_pod_container_status_restarts_total[1h])) > 0and on (pod, container, namespace)kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1

kube-state-metrics on GKE Autopilot vs. Standard

GKE comes in two modes: Autopilot and Standard. The behavior of kube-state-metrics differs between them.

GKE Standard

In Standard mode, you have full control over the cluster nodes. You can install kube-state-metrics via Helm as described above, configure it freely, and manage its resource requests and limits yourself. You can also use the managed Workloads State package if you prefer.

GKE Autopilot

In Autopilot mode, Google manages the nodes, and you cannot run privileged workloads or control node-level configuration. kube-state-metrics itself does not require privileged access and can run in Autopilot without special configuration. However, the most straightforward approach in Autopilot is to use the managed Workloads State package, since it handles configuration that would otherwise require cluster-level access.

If you install kube-state-metrics via Helm on Autopilot, make sure that the resource requests and limits in your values.yaml are appropriate, since Autopilot enforces minimum resource requirements per pod.

Best Practices for Using kube-state-metrics on GKE

  • Use the managed package when possible. On GKE, the Workloads State managed package reduces operational overhead and ensures the metrics feed correctly into Cloud Monitoring.
  • Limit metric collection to what you need. kube-state-metrics supports allowlists and denylists for metric names and resource types. Reducing the number of metrics collected lowers memory usage and scrape latency.
  • Set resource requests and limits. kube-state-metrics memory usage grows with cluster size. For large GKE clusters with many objects, set appropriate memory limits and monitor for OOMKilled restarts.
  • Use consistent labels. kube-state-metrics exposes all Kubernetes labels as metric labels. Keeping your label taxonomy clean ensures that dashboards and alerts remain readable.
  • Do not use kube-state-metrics for autoscaling decisions. It is not designed for HPA use cases. Use Metrics Server or custom metrics adapters for autoscaling.
  • Store metrics with adequate retention. For trend analysis and capacity planning, configure your Prometheus instance or Cloud Monitoring to retain kube-state-metrics data for at least 30 days.
  • Validate with the official list of stable metrics. Some kube-state-metrics metrics are experimental and subject to change. Prefer stable metrics in long-lived alert rules.

Troubleshooting Common Issues

kube-state-metrics Pod is Not Running

Check the pod logs for RBAC errors. kube-state-metrics requires a ClusterRole with read access to cluster resources. If the ClusterRoleBinding is missing or incorrect, the pod will fail to start. Run:

kubectl describe pod <kube-state-metrics-pod-name> -n monitoringkubectl logs <kube-state-metrics-pod-name> -n monitoring

Metrics Are Not Appearing in Prometheus

Verify that the ServiceMonitor or scrape config is correctly pointing to the kube-state-metrics Service. Check that the label selectors in the ServiceMonitor match the labels on the Service. Also confirm that Prometheus has the RBAC permissions to discover services in the monitoring namespace.

High Memory Usage

In clusters with thousands of pods or large numbers of secrets and configmaps, kube-state-metrics memory usage can become significant. Use the resources.requests and resource.limits values in the Helm chart to set boundaries, and consider using the collectors field to disable resource types you do not need.

# Example: Collect only pods, deployments, and nodeshelm install kube-state-metrics \  prometheus-community/kube-state-metrics \  --namespace monitoring \  --set collectors='{pods,deployments,nodes}'

Metrics Differ from kubectl Output

kube-state-metrics generates metrics directly from the Kubernetes API objects, without the heuristics that kubectl applies to format its output. For example, kubectl get pods may show a pod as Running while kube-state-metrics shows it in a non-ready condition. In such cases, kube-state-metrics represents the authoritative, machine-readable state.

Monitor Your GKE Cluster with CubeAPM

Collecting kube-state-metrics is only half the picture. CubeAPM ingests your Prometheus metrics, correlates them with distributed traces and logs, and surfaces actionable alerts without the cost of managing multiple monitoring tools.

With CubeAPM, GKE teams can:
Visualize pod phases, replica health, and node conditions in pre-built Kubernetes dashboards.
Alert on CrashLoopBackOff pods, deployment degradation, and node pressure before they impact users.
Correlate infrastructure events with application performance in a single query interface.
Deploy on-premises or in your own GCP project, keeping data fully within your control.

Disclaimer: This article contains pricing estimates based on publicly available AWS CloudWatch Logs rates as of May 2026. Actual costs may vary by AWS region, account type, and usage patterns. Always verify current pricing before making infrastructure decisions.

FAQs

1. What is kube-state-metrics on GKE?

kube-state-metrics is an add-on that listens to the Kubernetes API server and generates Prometheus-format metrics about the state of cluster objects like Pods, Deployments, and Nodes. On GKE, it tells you whether workloads are healthy at the object level, not how much CPU or memory they are consuming.

2. Does GKE have kube-state-metrics built in?

Yes. GKE offers a managed Workloads State package through Google Cloud Managed Service for Prometheus. Enable it from the Observability tab in your cluster settings and metrics flow into Cloud Monitoring automatically, no manual installation needed.

3. How do I install kube-state-metrics on GKE?

Run helm install kube-state-metrics prometheus-community/kube-state-metrics --namespace monitoring --create-namespace. Verify it is running with kubectl get pods -n monitoring and access metrics at port 8080.

4. What is the difference between kube-state-metrics and Metrics Server?

kube-state-metrics tracks object state like pod phases, replica counts, and restart counts. Metrics Server tracks live CPU and memory usage and powers the Horizontal Pod Autoscaler. Most GKE clusters use both.

5. Which kube-state-metrics metrics matter most on GKE?

Focus on
kube_pod_status_phase, kube_deployment_status_replicas_unavailable, kube_pod_container_status_restarts_total, kube_pod_container_status_waiting_reason, and kube_node_status_condition.
These cover pod health, deployment degradation, and node pressure in one go.

×
×