CubeAPM
CubeAPM CubeAPM

How to set up Prometheus on GKE using kube-prometheus-stack?

How to set up Prometheus on GKE using kube-prometheus-stack?

Table of Contents

Introduction

Prometheus on GKE is one of the most reliable ways to monitor Kubernetes clusters with open-source metrics, Grafana dashboards, and alerting. It helps teams track CPU usage, memory pressure, pod health, node performance, and service-level issues before incidents become guesswork.

The kube-prometheus-stack Helm chart packages everything into a single install: Prometheus, Grafana, Alertmanager, kube-state-metrics, the Prometheus Operator, and node-exporter, all maintained by the Prometheus community. The latest release as of May 2026 is version 84.5.0.

This guide walks you through every step: adding the Helm repo, configuring values for GKE Standard vs. Autopilot, installing the stack, accessing Grafana, and verifying metrics are flowing. It also covers common GKE-specific pitfalls and when Google Managed Prometheus might be the better fit.

Key Takeaways
  • kube-prometheus-stack is a single Helm chart that installs Prometheus, Grafana, Alertmanager, kube-state-metrics, and node-exporter on any Kubernetes cluster.
  • On GKE Standard, the default values work with minor RBAC adjustments. On GKE Autopilot, you must disable node-exporter, kubeProxy, kubeScheduler, and kubeControllerManager due to platform security restrictions.
  • Grafana is bundled in the chart and includes 30+ pre-built Kubernetes dashboards ready on first boot. No separate Grafana install is needed.
  • Google Managed Prometheus (GMP) is an alternative for teams who want a fully managed solution without running Prometheus themselves.
  • Remote write lets you forward Prometheus metrics to external backends such as Thanos, Cortex, or Grafana Cloud for long-term retention.
  • After installation, access Grafana via port-forward on port 3000 with the default credentials admin/prom-operator, which you should rotate immediately.

Prerequisites

Before you begin, make sure you have the following in place.

  • A GKE cluster (Standard or Autopilot) running Kubernetes 1.25 or later.
  • kubectl configured to talk to that cluster (gcloud container clusters get-credentials CLUSTER_NAME –region REGION –project PROJECT_ID).
  • Helm 3.x installed locally.
  • Cluster-admin permissions. For GKE specifically, you may need to bind your Google account: kubectl create clusterrolebinding owner-cluster-admin-binding –clusterrole cluster-admin –user $(gcloud config get-value account)
  • At least 2 vCPU and 4 GiB of free cluster capacity. The full stack uses roughly 1 CPU and 2 GiB at rest.

Step 1: Add the Helm Repository

Add the Prometheus community chart repository and refresh the local index.

helm repo add prometheus-community https://prometheus-community.github.io/helm-chartshelm repo update

Step 2: Create a Namespace

Install all monitoring components into a dedicated namespace. This keeps resources cleanly separated from your application workloads and makes RBAC policies easier to reason about.

kubectl create namespace monitoring

Step 3: Create a values.yaml File

The kube-prometheus-stack chart has thousands of configuration options. You control all of them through a values.yaml override file. The defaults work fine on a basic cluster, but on GKE you need to adjust several settings.

For GKE Standard Clusters

GKE Standard gives you full access to nodes, so the complete stack runs without modification. The main thing to configure is storage for Prometheus data persistence.

# values-gke-standard.yamlprometheus:  prometheusSpec:    retention: 15d    storageSpec:      volumeClaimTemplate:        spec:          accessModes: ["ReadWriteOnce"]          resources:            requests:              storage: 50Gi
grafana:  adminPassword: "change-me-now"  persistence:    enabled: true    size: 10Gi
alertmanager:  alertmanagerSpec:    storage:      volumeClaimTemplate:        spec:          accessModes: ["ReadWriteOnce"]          resources:            requests:              storage: 10Gi

For GKE Autopilot Clusters

GKE Autopilot manages nodes on your behalf and enforces strict security policies. The default kube-prometheus-stack chart tries to deploy node-exporter as a DaemonSet with host-level access (hostPID, hostNetwork, hostPort, and restricted hostPath volumes). Autopilot blocks all of these by default.

You will see errors like: admission webhook denied the request... hostPath volume uses path /proc which is not allowed, and container node-exporter specifies a host port; disallowed in Autopilot

The fix is to disable the components that require privileged node access. You still get workload metrics, pod-level data, and custom application metrics. You lose low-level node metrics (CPU temperature, disk I/O from the OS), but GKE Autopilot already exposes those through Google Cloud Monitoring.

# values-gke-autopilot.yaml# Disable components that violate Autopilot security policiesnodeExporter:  enabled: false
kubeProxy:  enabled: false
kubeScheduler:  enabled: false
kubeControllerManager:  enabled: false
kubeEtcd:  enabled: false
coreDns:  enabled: false
# kube-state-metrics works fine on AutopilotkubeStateMetrics:  enabled: true
prometheus:  prometheusSpec:    retention: 15d    # Scrape cAdvisor metrics from nodes via the Kubernetes API    additionalScrapeConfigs:      - job_name: 'kubernetes-pods-cadvisor'        scheme: https        metrics_path: /metrics/cadvisor        kubernetes_sd_configs:          - role: node        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token        tls_config:          insecure_skip_verify: true        relabel_configs:          - action: labelmap            regex: __meta_kubernetes_node_label_(.+)
grafana:  adminPassword: "change-me-now"

Step 4: Install the Chart

Run the install command, pointing to the appropriate values file for your cluster type.

GKE Standard

helm install kube-prometheus-stack  prometheus-community/kube-prometheus-stack  --namespace monitoring  --values values-gke-standard.yaml  --create-namespace

GKE Autopilot

helm install kube-prometheus-stack  prometheus-community/kube-prometheus-stack  --namespace monitoring  --values values-gke-autopilot.yaml  --create-namespace

The install takes 2 to 3 minutes. GKE Autopilot may take longer on the first run because it needs to provision new nodes to accommodate the monitoring pods.

Step 5: Verify the Installation

Check that all pods are running in the monitoring namespace.

kubectl get pods -n monitoring

You should see pods for: prometheus-kube-prometheus-stack-prometheus-0, alertmanager-kube-prometheus-stack-alertmanager-0, kube-prometheus-stack-grafana, kube-prometheus-stack-kube-state-metrics, and (on Standard only) kube-prometheus-stack-prometheus-node-exporter.

All pods should reach Running status within 3 to 5 minutes. If any pod stays in Pending, check events with kubectl describe pod POD_NAME -n monitoring.

Step 6: Access Grafana

The chart deploys Grafana with a ClusterIP service by default, which is not reachable from outside the cluster. Use port-forward to access it locally.

kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoring

Open http://localhost:3000 in your browser. Log in with:

  • Username: admin
  • Password: prom-operator (the chart default, unless you set adminPassword in your values.yaml)
Security Note
  • Change the Grafana admin password immediately in production. Set adminPassword in your values.yaml and upgrade the chart, or change it inside Grafana at Profile > Change Password.
  • Never expose Grafana directly via a LoadBalancer without authentication. Use an ingress with OAuth2 proxy, Identity-Aware Proxy (IAP), or at minimum HTTP basic auth.

Accessing Prometheus

Similarly, use port-forward to reach the Prometheus UI.

kubectl port-forward svc/kube-prometheus-stack-prometheus 9090:9090 -n monitoring

Open http://localhost:9090 in your browser to run PromQL queries and check scrape targets.

Step 7: Explore Pre-built Grafana Dashboards

One of the biggest advantages of kube-prometheus-stack is the 30+ Grafana dashboards it ships with. After logging in, go to Dashboards > Browse to find them.

The most useful dashboards for GKE workloads are:

  • Kubernetes / Compute Resources / Cluster: overall CPU and memory across the entire cluster.
  • Kubernetes / Compute Resources / Namespace (Pods): per-pod resource breakdown inside a namespace.
  • Kubernetes / Compute Resources / Node (Pods): node-level resource usage (Standard clusters only).
  • Kubernetes / Networking / Cluster: network ingress and egress across the cluster.
  • Kubernetes / Persistent Volumes: PVC usage and capacity.
  • Node Exporter / Full: detailed system-level metrics for each node (Standard clusters only).

Step 8: Upgrading the Chart

When new versions of kube-prometheus-stack are released, upgrade with Helm. Always diff your values before applying.

helm repo update
# Preview what will changehelm diff upgrade kube-prometheus-stack \  prometheus-community/kube-prometheus-stack \  --namespace monitoring \  --values values-gke-standard.yaml
# Apply the upgradehelm upgrade kube-prometheus-stack \  prometheus-community/kube-prometheus-stack \  --namespace monitoring \  --values values-gke-standard.yaml

Note: Some major chart versions include CRD updates that Helm does not apply automatically. Check the changelog at https://github.com/prometheus-community/helm-charts/releases before each upgrade.

Optional: Remote Write for Long-Term Storage

Prometheus stores data locally by default. The default retention is 10 days. For longer retention across multiple clusters, configure remote write to an external backend such as Thanos, Cortex, Mimir, or Grafana Cloud.

# Add to your values.yaml under prometheus.prometheusSpecprometheus:  prometheusSpec:    remoteWrite:      - url: "https://your-remote-write-endpoint/api/v1/push"        basicAuth:          username:            name: prometheus-remote-write-auth            key: username          password:            name: prometheus-remote-write-auth            key: password    externalLabels:      cluster: "my-gke-cluster"      env: "production"      region: "us-central1"

Create the Kubernetes Secret for credentials before applying the values:

kubectl create secret generic prometheus-remote-write-auth  --from-literal=username=YOUR_USERNAME  --from-literal=password=YOUR_PASSWORD  --namespace monitoring

Source reference for remote write configuration pattern: https://github.com/prometheus-community/helm-charts (Infrastructure Observability Docs Hub guide)

Alternative: Google Managed Prometheus (GMP)

If you do not want to manage Prometheus yourself, Google offers a fully managed alternative called Google Managed Service for Prometheus (GMP). It is built on Monarch, Google’s globally scalable time-series database. GMP is enabled by default on all new GKE Autopilot clusters.

With GMP, you do not run your own Prometheus binary. Instead, GKE deploys lightweight collectors in the gmp-system namespace that scrape your workloads and forward metrics to Google Cloud Monitoring. You then query those metrics using PromQL in Metrics Explorer or point Grafana at Cloud Monitoring as a data source.

The trade-off: GMP uses different CRDs. PrometheusRule becomes ClusterRules. ServiceMonitor becomes PodMonitoring. If you migrate from kube-prometheus-stack to GMP, every monitoring rule and scrape config needs to be rewritten. 

GMP makes sense when you want zero operational overhead and are comfortable with vendor lock-in. kube-prometheus-stack makes sense when you need maximum control, portability across clouds, and the full Prometheus ecosystem.

Enabling GMP in Terraform

resource "google_container_cluster" "cluster" {  # ...  monitoring_config {    managed_prometheus {      enabled = true    }    enable_components = [      "SYSTEM_COMPONENTS",      "CADVISOR",      "KUBELET",      "POD",      "DEPLOYMENT",    ]  }}

Common Issues and Fixes

Autopilot provisions nodes on demand. New pods may sit in Pending for 3 to 5 minutes while a node is provisioned. This is normal behavior on first install. If pods remain Pending beyond 10 minutes, check node provisioning with kubectl get nodes and kubectl describe pod POD_NAME -n monitoring.

Error: services is forbidden: cannot create resource in namespace kube-system. This happens when the chart tries to create Services in kube-system. Disable the relevant components in values.yaml: set kubeProxy.enabled, kubeScheduler.enabled, kubeControllerManager.enabled, and kubeEtcd.enabled all to false.

Check three things in order:

  • Verify Prometheus is scraping successfully: open the Prometheus UI at localhost:9090, go to Status > Targets, and confirm your targets are UP.
  • Verify the Grafana data source is pointing to the right Prometheus URL: in Grafana, go to Configuration > Data Sources and test the connection.
  • Check for label mismatches: ServiceMonitor selectors must match the labels on your Service objects exactly.

This is expected. node-exporter requires hostPID, hostNetwork, and access to /proc and /sys, which Autopilot prohibits. Set nodeExporter.enabled: false in your values file.

This is also expected. Without node-exporter, node-level OS metrics are unavailable from within the cluster. Use Google Cloud Monitoring to access node metrics on Autopilot clusters, or add the cAdvisor scrape config shown in Step 3 to get container-level resource data.

Monitor Your GKE Workloads with CubeAPM
Running Prometheus and Grafana on GKE is powerful, but managing the stack yourself means handling upgrades, storage, cardinality limits, and dashboard sprawl. CubeAPM gives you full-stack observability on Kubernetes out of the box, with:
  • Prometheus-compatible metrics collection with zero manual configuration
  • Pre-built GKE dashboards for nodes, pods, namespaces, and deployments
  • Built-in Grafana-style visualization and alerting, no separate install needed
  • Works with both GKE Standard and GKE Autopilot clusters
  • Lightweight agents that respect GKE Autopilot security restrictions
Get started for free at cubeapm.com and connect your GKE cluster in minutes.
Try CubeAPM Free →

Conclusion

The kube-prometheus-stack Helm chart gives you a production-grade observability stack on GKE with a single install command. On GKE Standard, the defaults work well with minor storage configuration. On GKE Autopilot, you need to disable the components that conflict with Autopilot’s security model: node-exporter, kubeProxy, kubeScheduler, kubeControllerManager, and kubeEtcd.

Once installed, Grafana ships with 30+ pre-built Kubernetes dashboards that give you immediate visibility into cluster health, workload resource usage, networking, and storage. For long-term metric retention, configure remote write to Thanos, Mimir, or Grafana Cloud.

If you prefer a fully managed path without running Prometheus yourself, Google Managed Prometheus is a viable alternative, particularly for GKE Autopilot users who want native integration with Google Cloud Monitoring.

Disclaimer: This article contains pricing estimates based on publicly available AWS CloudWatch Logs rates as of May 2026. Actual costs may vary by AWS region, account type, and usage patterns. Always verify current pricing before making infrastructure decisions.

FAQs

1. Can I run kube-prometheus-stack on GKE Autopilot?

Yes, but you need to disable components that conflict with Autopilot’s security policies: nodeExporter, kubeProxy, kubeScheduler, kubeControllerManager, and kubeEtcd. These require privileged node access or try to create resources in the managed kube-system namespace, both of which Autopilot blocks. Prometheus, Grafana, Alertmanager, and kube-state-metrics all run fine.

2. What is the default Grafana password after installing kube-prometheus-stack?

The default credentials are username admin and password prom-operator. Override this by setting grafana.adminPassword in your values.yaml before installation. Never leave the default in place on a network-accessible cluster.

3. How long does Prometheus retain metrics by default, and how do I increase it?

The default retention is 10 days. Set prometheus.prometheusSpec.retention in your values.yaml to increase it (for example, 30d). Also configure a PersistentVolumeClaim so data survives pod restarts. For retention beyond 60 days, use remote write to Thanos, Mimir, or Grafana Cloud instead.

4. What is the difference between kube-prometheus-stack and Google Managed Prometheus (GMP)?

kube-prometheus-stack runs Prometheus inside your cluster and gives you full control. Google Managed Prometheus (GMP) is fully managed by Google with no pods to operate, but uses different CRDs and stores metrics in Google Cloud Monitoring. Choose GMP for zero operational overhead, kube-prometheus-stack for portability and full Prometheus API compatibility.

5. How do I add a custom application metric to Prometheus on GKE?

Expose a /metrics endpoint from your application using a Prometheus client library, then create a ServiceMonitor resource pointing to your Service. The Prometheus Operator picks it up within one scrape interval (30 seconds by default). No Prometheus restart needed.

×
×