CubeAPM
CubeAPM CubeAPM

How to Monitor Google Cloud Run Services — Latency and Errors?

How to Monitor Google Cloud Run Services — Latency and Errors?

Table of Contents

Google Cloud Run makes it easy to deploy containers without managing servers. But easy to deploy does not mean easy to keep running well. Once your service handles real traffic, two signals matter most: how fast are responses (latency) and how many requests are failing (error rate).

The good news is that Cloud Run automatically sends metrics to Cloud Monitoring with no extra setup or configuration required. The challenge is knowing which metrics to watch, how to set up alerting thresholds that catch real problems, and how to read the signals before users start complaining.

This guide walks you through Google Cloud Run monitoring step by step: built-in metrics available out of the box, writing alert policies, and setting up uptime checks that catch availability gaps. All commands and API paths have been verified against the official Google Cloud documentation.

Key Takeaways
  • Cloud Run automatically integrates with Cloud Monitoring at no extra cost on the fully managed version. No agents needed.
  • The two most critical metrics are run.googleapis.com/request_latencies and run.googleapis.com/request_count (for error rate).
  • Always alert on p95 or p99 latency, not average. Average hides tail latency problems that affect real users.
  • Error rate = 5xx responses divided by total requests. Alert at 1–5% depending on your SLO.
  • Uptime checks give you external availability monitoring and auto-adjust when new revisions are deployed.
  • Third-party tools like Datadog, Dynatrace, and CubeAPM add cross-service correlation, distributed tracing, and richer dashboards.
  • Cold starts appear as container startup latency spikes. Monitor run.googleapis.com/container/startup_latency separately.

What Does Google Cloud Run Monitoring Cover?

Cloud Run is a fully managed serverless platform on Google Cloud that runs stateless containers on demand. It scales automatically, including down to zero when no traffic is present.

Google Cloud Observability provides the logging and monitoring layer for Cloud Run. This includes:

  • Cloud Monitoring: captures metrics, hosts dashboards, sends alerts.
  • Cloud Logging: stores structured logs from your containers.
  • Error Reporting: surfaces grouped error events from specially formatted logs.
  • Cloud Trace: records distributed traces to locate latency sources.

Cloud Run is automatically integrated with Cloud Monitoring with no setup or configuration required. Metrics are captured as soon as your revision is running. Source: Google Cloud Run documentation (cloud.google.com/run/docs/monitoring).

Key Metrics for Google Cloud Run Monitoring

These are the metrics that matter most for a production service. All are available via Cloud Monitoring without any agent or sidecar.

Metric NameWhat It MeasuresWhy It Matters
run.googleapis.com/request_latenciesResponse time distribution (ms)Detects slow responses; use p95/p99 thresholds
run.googleapis.com/request_countTotal requests by response code classCalculate error rate from 5xx vs total
run.googleapis.com/container/instance_countActive container instancesSpot scaling anomalies or cold starts
run.googleapis.com/container/cpu/utilizationsCPU usage distributionIdentify over-provisioned or starved containers
run.googleapis.com/container/memory/utilizationsMemory usage distributionPrevent OOM crashes before they happen
run.googleapis.com/container/startup_latencyTime for container to startDiagnose cold-start problems

How to Monitor Latency on Cloud Run

Why p95 and p99 Matter More Than Average

Average latency is misleading. A service could complete 90% of requests in 50 ms and 10% in 10 seconds, and the average still looks acceptable. p95 means 95% of requests completed within that time. p99 catches the worst-performing 1% of your traffic, which often maps directly to users who churn or report bugs.

Always alert on p95 or p99 latency, not mean latency.

Setting Up a Latency Dashboard

In the Google Cloud console, go to Monitoring > Dashboards > Create Dashboard, then add a new chart. Use the following MQL queries. Note: MQL is no longer recommended for new Cloud Monitoring assets via the console UI, but existing MQL charts still work and you can still create them through the Cloud Monitoring API.

P50 latency (median response time):

fetch cloud_run_revision| metric 'run.googleapis.com/request_latencies'| group_by [service_name],    [val: percentile(value.request_latencies, 50)]| every 1m

P95 latency (recommended for alerting):

fetch cloud_run_revision| metric 'run.googleapis.com/request_latencies'| group_by [service_name],    [val: percentile(value.request_latencies, 95)]| every 1m

P99 latency (tail latency):

fetch cloud_run_revision| metric 'run.googleapis.com/request_latencies'| group_by [service_name],    [val: percentile(value.request_latencies, 99)]| every 1m

Creating a Latency Alert Policy

This alert fires when p95 latency exceeds 2,000 ms (2 seconds) for 5 minutes. Save the following as cloudrun-latency-alert.json:

{  "displayName": "Cloud Run High Latency - p95",  "combiner": "OR",  "conditions": [{    "displayName": "p95 latency above 2000ms",    "conditionThreshold": {      "filter": "resource.type=\"cloud_run_revision\" AND metric.type=\"run.googleapis.com/request_latencies\"",      "comparison": "COMPARISON_GT",      "thresholdValue": 2000,      "duration": "300s",      "aggregations": [{        "alignmentPeriod": "60s",        "perSeriesAligner": "ALIGN_PERCENTILE_95",        "crossSeriesReducer": "REDUCE_PERCENTILE_95",        "groupByFields": ["resource.labels.service_name"]      }]    }  }]}

Deploy the alert policy:

gcloud monitoring policies create --policy-from-file=cloudrun-latency-alert.json

Adjust the thresholdValue to match your SLO. A public-facing API might alert at 1,000 ms; a batch processing service might allow 5,000 ms.

How to Monitor Error Rates on Cloud Run

Understanding Error Rate Calculation

Error rate is the ratio of HTTP 5xx responses to total requests. Cloud Run labels all requests with a response_code_class field in the request_count metric, so you can filter by class.

MQL query to compute error rate:

fetch cloud_run_revision| metric 'run.googleapis.com/request_count'| group_by [service_name],    [error_rate: sum(if(response_code_class = '5xx', val(), 0)) / sum(val())]| every 1m

Creating an Error Rate Alert

This alert fires when the 5xx error rate exceeds 5% for 5 minutes:

{  "displayName": "Cloud Run High Error Rate",  "combiner": "OR",  "conditions": [{    "displayName": "Error rate above 5%",    "conditionThreshold": {      "filter": "resource.type=\"cloud_run_revision\"               AND metric.type=\"run.googleapis.com/request_count\"               AND metric.labels.response_code_class=\"5xx\"",      "comparison": "COMPARISON_GT",      "thresholdValue": 0.05,      "duration": "300s",      "aggregations": [{        "alignmentPeriod": "60s",        "perSeriesAligner": "ALIGN_RATE"      }]    }  }]}
gcloud monitoring policies create --policy-from-file=cloudrun-error-rate-alert.json

A 5% threshold is a starting point. Adjust it based on your traffic volume and SLO. High-traffic services often use 1% or even 0.5%.

Filtering by Response Code Class

Cloud Run’s request_count metric includes the label response_code_class with values 1xx, 2xx, 3xx, 4xx, and 5xx. You can filter to separate error categories:

  • 5xx: Server errors your service is responsible for.
  • 4xx: Client errors. High rates of 404 or 429 can indicate misuse or rate limit issues, but are not service failures.
  • 2xx: Successful responses. This is the baseline you want to maximize.

Setting Up Uptime Checks for Cloud Run

Latency and error metrics tell you what happened after a request reached a running container. Uptime checks tell you whether the service is reachable at all from the outside world, catching issues with DNS, load balancing, SSL termination, and cold start failures before any request gets through.

Creating an Uptime Check

In the Google Cloud console, go to Monitoring > Uptime, click Create Uptime Check, and select the Cloud Run service option. Cloud Run-specific uptime checks automatically adjust between revisions and accommodate traffic splits when you do a gradual rollout.

You can also use the gcloud CLI:

gcloud monitoring uptime create my-service-check  --display-name='My Cloud Run Service Uptime Check'  --resource-type=cloud-run-revision  --period=1  --timeout=10

Adding a Health Endpoint to Your Container

Uptime checks work best when they hit a dedicated health endpoint rather than a business logic route. Here are minimal examples in three common languages:

FastAPI (Python):

@app.get('/health')async def health():    return {"status": "ok"}

Express (Node.js):

app.get('/health', (req, res) => {  res.json({ status: 'ok' });});

Go:

http.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {    w.Header().Set("Content-Type", "application/json")    w.Write([]byte(`{"status":"ok"}`))})

Cloud Run Liveness and Startup Probes

Cloud Run also supports container-level probes that restart unhealthy instances. These are internal checks that run inside the container runtime. Add them to your service YAML:

spec:  template:    spec:      containers:      - image: gcr.io/my-project/my-app        livenessProbe:          httpGet:            path: /health            port: 8080          initialDelaySeconds: 0          periodSeconds: 10        startupProbe:          httpGet:            path: /health            port: 8080          failureThreshold: 3          periodSeconds: 10

Probes restart unhealthy containers automatically. Uptime checks confirm user-facing availability after internal recovery. Use both.

Diagnosing Cold Start Latency

Cold starts are one of the most common latency complaints with serverless platforms. They happen when Cloud Run scales up from zero or when a new container instance is created to handle additional load. The container has to initialize before it can serve requests, adding latency invisible in your application code.

How to Detect Cold Starts

Monitor the container startup latency metric:

fetch cloud_run_revision| metric 'run.googleapis.com/container/startup_latency'| group_by [service_name],    [p99_startup: percentile(value.startup_latency, 99)]| every 1m

Also check container instance count. Spikes in instance_count alongside latency spikes confirm cold-start pressure.

Reducing Cold Start Impact

  • Set minimum instances to 1 in your Cloud Run service configuration to keep at least one warm container ready.
  • Use CPU always allocated if your service handles bursty traffic.
  • Reduce container image size to cut pull time during cold starts.
  • Move heavy initialization (DB connections, model loading) outside of request handlers.

Setting minimum instances eliminates cold starts for latency-sensitive services at the cost of keeping a container running when there is no traffic. This is the most direct fix for services with strict p99 SLOs.

Custom Metrics with OpenTelemetry and Prometheus

Built-in Cloud Run metrics track infrastructure-level signals. For application-level metrics like queue depth, cache hit rates, and background job success rates, you need custom instrumentation.

Option 1: OpenTelemetry Sidecar

Deploy an OpenTelemetry Collector as a sidecar container in your Cloud Run service. Your application sends metrics to the collector, which forwards them to Cloud Monitoring using the OTLP endpoint (Cloud Monitoring now supports OTLP metrics natively). 

Option 2: Prometheus Sidecar

Run a Prometheus metrics endpoint inside your container. Use a sidecar agent to scrape and forward metrics to Cloud Monitoring. This works well if you already use Prometheus in other parts of your infrastructure.

Option 3: Log-Based Metrics

If you cannot add a sidecar, use log-based metrics. Structure your container logs as JSON, write meaningful numeric fields (e.g., duration_ms, items_processed), and create log-based metrics in Cloud Monitoring that extract these values. Log-based metrics work with existing alerting and dashboards.

{ "severity": "INFO", "message": "processed batch",  "duration_ms": 320, "items": 48, "service": "my-cloud-run-svc" }

Google Cloud Run Monitoring Best Practices

  • Alert on percentiles, not averages. Use p95 for standard SLOs and p99 for tail latency. Average latency masks problems affecting real users.
  • Separate 4xx from 5xx error tracking. High 4xx rates indicate client misuse; high 5xx rates indicate service failures. Conflating them creates alert noise.
  • Set burn rate alerts for error budgets. Instead of alerting when error rate exceeds a threshold for 5 minutes, use multiwindow burn rate alerts that fire faster during severe incidents and slower during gradual degradation.
  • Monitor container instance count alongside latency. A latency spike that coincides with a sudden instance_count increase almost always points to a cold start event.
  • Use labels to separate services. Group all Cloud Run metrics by service_name and revision_name labels so you can compare current versus previous revision performance during rollouts.
  • Set up Personalized Service Health alerts. Cloud Run incidents are published to Google Cloud’s Personalized Service Health. Set up alerts on Service Health events to get platform-level incident notifications alongside your metric alerts.
Cloud Run Observability
Stop Flying Blind on Cloud Run
CubeAPM gives you out-of-the-box visibility into your Google Cloud Run services: latency percentiles, error rates, container health, and distributed traces, all in one place. No complex setup. No YAML sprawl.
If you’re already exporting metrics to Cloud Monitoring, CubeAPM can pull them in alongside your application traces and logs so you get a single pane of glass instead of clicking between four dashboards.
Try CubeAPM Free → docs.cubeapm.com

Conclusion

Effective Google Cloud Run monitoring comes down to a short list of well-chosen metrics, alert policies calibrated to percentiles rather than averages, and uptime checks that validate user-facing availability end to end.

Start with request_latencies (p95, p99) and request_count (5xx error rate). Add uptime checks on your service URL or custom domain. Layer in startup_latency monitoring if cold starts affect your SLO. Graduate to custom metrics and distributed tracing as your service grows in complexity.

Cloud Monitoring gives you everything you need to get started without any agent or extra cost on the fully managed platform. Third-party tools like Datadog, Dynatrace, and CubeAPM are worth evaluating when you need cross-service correlation or richer anomaly detection.

DisclaimerThis article is provided for informational purposes only. Metric names, API paths, and console UI steps are accurate as of May 2026 and based on Google Cloud’s official documentation and publicly available sources. Google may update Cloud Run, Cloud Monitoring APIs, or console interfaces at any time. Always refer to the official Google Cloud documentation at cloud.google.com/run/docs for the most current information.

FAQs

1. Does Google Cloud Run monitoring cost anything extra?

For the fully managed version of Cloud Run, Google Cloud Observability pricing applies, which means there is no charge for Cloud Run metrics in Cloud Monitoring. You may incur costs for logs ingestion beyond the free tier and for custom metrics. Check cloud.google.com/stackdriver/pricing for current limits.

2. How do I view Cloud Run metrics without writing queries?

Go to the Cloud Run console, click on your service, and open the Metrics tab. This displays built-in charts for request count, latency, container instance count, CPU, and memory without any query writing. For more granular filtering and custom time ranges, use the Metrics Explorer in Cloud Monitoring.

3. What is a good p95 latency threshold to alert on?

A common starting point is 1,000 ms (1 second) for user-facing APIs and 2,000-5,000 ms for internal or batch services. The right threshold depends on your SLO. Review your baseline p95 latency from the first few days of production traffic and set the alert at roughly 2-3x that baseline.

4. How is error rate different from error count in Cloud Run monitoring?

Error count is the raw number of 5xx responses in a period. Error rate is the ratio of 5xx responses to total requests. Error rate is more useful for alerting because it adjusts for traffic volume. A service getting 1,000 errors per minute during a traffic surge is less concerning than a service getting 10 errors per minute with barely any traffic.

5. Can I monitor Cloud Run services across multiple projects in one dashboard?

Yes. Cloud Monitoring supports cross-project metrics scoping through Monitoring scope projects (formerly called Stackdriver workspaces). You can add multiple Google Cloud projects to a single Monitoring scope and create dashboards and alerts that span all of them. This is useful for organizations with separate dev, staging, and production projects.

×
×