Google Cloud SQL is a fully managed relational database service that supports MySQL, PostgreSQL, and SQL Server on Google Cloud Platform. While it offloads much of the infrastructure management, database performance and replication health do not manage themselves. Slow queries, connection exhaustion, storage pressure, and replication lag can silently degrade your application long before alerts fire.
This guide walks you through the key metrics to track, the built-in and third-party tools available, and a practical approach to setting up Google Cloud SQL monitoring for both performance and replication.
Key Takeaways
- ✔ Google Cloud SQL monitoring is built on Cloud Monitoring (formerly Stackdriver). All metrics are available there by default.
- ✔ The most critical performance metrics are CPU utilization, memory usage, disk I/O, active connections, and query latency.
-
✔ For replication, the primary metric to watch is
database/replication/replica_lag, which measures how far behind a read replica is from the primary. - ✔ Query Insights provides query-level visibility for diagnosing slow queries, available for MySQL and PostgreSQL.
- ✔ Third-party tools (Datadog, New Relic, Sumo Logic, Dynatrace) extend Cloud SQL monitoring with richer dashboards, anomaly detection, and cross-stack correlation.
-
✔ Set alerting policies on CPU (>80%), storage (>85%), active connections nearing
max_connections, and replication lag exceeding acceptable thresholds.
Why Google Cloud SQL Monitoring Matters
Most Cloud SQL performance issues do not surface as a single obvious failure. The first signs are typically a slow API response, a checkout page timing out, or a report taking ten seconds longer than expected. By the time users notice, the problem has usually been building for minutes or hours.
Proactive monitoring helps you catch these patterns early. Specifically, Google Cloud SQL monitoring allows you to:
- Identify CPU, memory, or disk bottlenecks before they cause outages
- Detect replication lag on read replicas that may be serving stale data
- Trace slow queries back to specific SQL statements using Query Insights
- Control cloud costs by understanding workload patterns and storage growth trends
- Meet compliance requirements through audit logs and access tracking
Google Cloud SQL supports three database engines: MySQL, PostgreSQL, and SQL Server. Monitoring requirements are broadly similar across all three, but some metrics and tools are engine-specific. Query Insights, for example, is available for both MySQL and PostgreSQL.
Key Metrics to Monitor in Google Cloud SQL
Google Cloud SQL exposes metrics through the Cloud Monitoring API under the cloudsql.googleapis.com namespace. The following are the metrics that matter most for performance and replication.
Performance Metrics
| Metric | What It Measures | Threshold / Action |
| database/cpu/utilization | Fraction of CPU used by the instance | Alert if > 80% sustained |
| database/memory/utilization | Fraction of memory used | Alert if > 90% |
| database/disk/read_ops_count | Read operations per second | Baseline, watch for spikes |
| database/disk/write_ops_count | Write operations per second | Baseline, watch for spikes |
| database/network/received_bytes_count | Bytes received by instance | Track for bandwidth costs |
| database/network/sent_bytes_count | Bytes sent by instance | Track for bandwidth costs |
| database/postgresql/num_backends | Number of active connections (PostgreSQL) | Stay below max_connections |
| database/mysql/queries | Query count per second (MySQL) | Monitor for query volume spikes |
| database/disk/bytes_used | Disk space consumed | Alert if > 85% of capacity |
Replication Metrics
For high availability and read scaling, Cloud SQL supports read replicas. Replication lag is the most important metric in this area.
| Metric | What It Measures | Threshold / Action |
| database/replication/replica_lag | Seconds the replica is behind the primary | Alert if lag > 30 seconds |
| database/replication/network_lag | Seconds for binlog to travel from primary to replica | Helps diagnose network-side lag |
| database/replication/replica_byte_lag | Bytes the replica is behind the primary | Useful for large writes |
| database/auto_failover_request_count | Number of auto-failover events triggered | Non-zero = investigate |
| database/available_for_failover | Whether the instance is ready for failover | Monitor for HA readiness |
Built-In Google Cloud SQL Monitoring Tools
Cloud Monitoring (Metrics Explorer)
Cloud Monitoring is the primary interface for Google Cloud SQL monitoring. It is available in the Google Cloud Console at Monitoring > Metrics Explorer. Without any additional configuration, you have access to all cloudsql.googleapis.com metrics with up to 6 weeks of data retention.
You can build custom dashboards by:
- Opening Cloud Monitoring in the Google Cloud Console
- Navigating to Dashboards > Create Dashboard
- Adding chart widgets using metric type filter cloudsql.googleapis.com
- Grouping by database_id to separate metrics per instance
Alerting Policies
Alerting policies let you define thresholds and route notifications to email, PagerDuty, Slack, and other channels. A minimal production setup should include alerts for:
- CPU utilization sustained above 80% for more than 5 minutes
- Storage usage exceeding 85% of provisioned capacity
- Active connections approaching max_connections (track database/postgresql/num_backends or database/mysql/queries as a proxy)
- Replication lag (database/replication/replica_lag) exceeding your acceptable threshold
- Auto-failover request count becoming non-zero
Query Insights
Query Insights is Cloud SQL’s built-in query performance tool. It captures query execution data at the SQL statement level and presents it in a dashboard that shows database load by query, average latency, execution plan samples, and trace data.
Query Insights is available for Cloud SQL for MySQL and Cloud SQL for PostgreSQL. On the Enterprise Plus edition, it includes deeper plan analysis and trace correlation.
To enable Query Insights:
- Open the Cloud SQL instance in the Cloud Console
- Navigate to Overview > Query Insights
- Toggle Enable Query Insights
- For Enterprise Plus, also enable the Cloud Trace API in your project
Once enabled, the Query Insights dashboard lets you filter by query fingerprint, time range, user, and database. The top contributors view identifies which queries are consuming the most database load, which is the starting point for optimization.
Cloud SQL Logs
Cloud SQL writes logs to Cloud Logging. The main log types are:
- Error logs: Database-level errors and warnings
- Slow query logs: Queries that exceed the log_min_duration_statement threshold (PostgreSQL) or long_query_time (MySQL)
- General query logs: All queries executed (high volume, use selectively)
- Audit logs: User access, schema changes, and data access events
To enable slow query logging for PostgreSQL, set the database flag log_min_duration_statement to a value in milliseconds (for example, 1000 for 1 second). For MySQL, set long_query_time. You can do this via the Cloud Console under Instance > Edit > Database flags.
Logs can be exported to BigQuery using log sinks for long-term analysis and compliance.
How to Monitor Replication Lag in Google Cloud SQL
Replication lag is the most operationally sensitive metric for systems using read replicas. A replica that is seconds behind the primary will serve stale reads. A replica that is minutes behind may not be usable as a failover target.
Understanding What Causes Replication Lag
Replication lag in Cloud SQL has several common causes:
- Write-heavy workloads on the primary that generate more binlog volume than the replica can apply
- Long-running transactions on the primary that delay log shipping
- CPU or disk I/O pressure on the replica that slows down log application
- Network latency between the primary and a cross-region replica
- Large single transactions such as bulk inserts or schema migrations
Checking Replication Lag
The quickest way to check replication lag is via the Cloud Monitoring metric database/replication/replica_lag. You can also check it directly from the database:
For PostgreSQL replicas:
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;For MySQL replicas:
SHOW SLAVE STATUS\G
Look at the Seconds_Behind_Master field in the MySQL output.
Reducing Replication Lag
If replication lag is consistently high, the following approaches are effective:
- Upgrade the replica’s machine type if it is CPU or disk-bound (check database/cpu/utilization and database/disk/write_ops_count on the replica)
- Break large transactions into smaller batches to reduce log volume per commit
- Enable parallel replication if your MySQL version supports it (set replica_parallel_workers in database flags)
- Use a cross-region replica only when geographic distribution is required. Place replicas in the same region as the primary when possible to minimize network lag
- Avoid running expensive analytical queries on the replica during peak replication periods
Setting Up Alerts and Dashboards
Creating a Custom Dashboard in Cloud Monitoring
A practical Google Cloud SQL monitoring dashboard should include the following panels:
- CPU Utilization (database/cpu/utilization) as a line chart over time
- Memory Utilization (database/memory/utilization) as a line chart
- Disk Usage (database/disk/bytes_used) as a gauge with provisioned size overlay
- Active Connections (database/postgresql/num_backends or equivalent) as a line chart
- Replication Lag (database/replication/replica_lag) as a line chart per replica
- Query count or queries per second as a volume indicator
Google Cloud Monitoring lets you create these dashboards manually or via the Monitoring API and Terraform. The google_monitoring_dashboard Terraform resource supports full dashboard configuration as code.
Recommended Alerting Thresholds
| Metric | What It Measures | Threshold / Action |
| CPU utilization | Sustained above 80% for 5 min | Scale up instance or optimize queries |
| Memory utilization | Above 90% | Check connection count, increase RAM |
| Disk usage | Above 85% of capacity | Enable auto-storage increase or provision more disk |
| Replication lag | Exceeds your SLA threshold (e.g., 30s) | Investigate replica health, upgrade tier |
| Active connections | Within 20% of max_connections | Implement connection pooling |
| Auto-failover count | Any non-zero value | Investigate instance stability |
Third-Party Google Cloud SQL Monitoring Tools
While Cloud Monitoring provides solid native coverage, third-party tools add richer dashboards, anomaly detection, longer retention, and cross-stack correlation (connecting Cloud SQL performance to application traces and infrastructure events).
CubeAPM
CubeAPM is a full-stack observability platform designed for teams running applications on Google Cloud. It integrates with Cloud SQL to surface query performance, replication lag, and infrastructure metrics alongside application traces and logs in a single view. Where native Cloud Monitoring shows you what is happening at the infrastructure level, CubeAPM connects that to which service, endpoint, or deployment caused it. Pre-built Cloud SQL dashboards, anomaly-based alerting, and query-level drill-down are available without complex instrumentation overhead.
Datadog
Datadog’s Google Cloud SQL integration pulls metrics via the Cloud Monitoring API. It provides pre-built dashboards, auto-discovery of new instances, and integration with Datadog APM so you can trace a slow application request back to the specific SQL query that caused it.
New Relic
New Relic supports Google Cloud SQL monitoring through its GCP integration. It ingests Cloud SQL metrics and correlates them with New Relic APM transaction data. Dashboards include query throughput, connection counts, and replication lag with configurable alerting.
Sumo Logic
The Sumo Logic app for Google Cloud SQL ingests both Cloud SQL metrics and Cloud Logging data (audit logs, error logs, platform logs). Its preconfigured dashboards cover resource utilization, replication lag, authorization failures, user activity, and error patterns.
Dynatrace
Dynatrace’s Google Cloud SQL integration uses AI-powered anomaly detection to identify performance degradations without requiring manual threshold configuration. It automatically baselines metric behavior and surfaces root-cause analysis when deviations occur.
SolarWinds
SolarWinds Database Performance Monitor (DPM) provides deep query-level analysis for Cloud SQL instances. It captures execution plans, identifies index recommendations, and shows query wait time breakdown, which is useful for diagnosing latency beyond what Cloud SQL’s built-in Query Insights provides.
When choosing a third-party tool, evaluate whether it provides metric ingestion only, or whether it also ingests logs and traces. Full observability requires all three.
Google Cloud SQL Monitoring Best Practices
Based on the operational patterns that distinguish well-monitored Cloud SQL deployments, the following practices are worth implementing from the start.
It has minimal performance overhead and provides query-level visibility that is otherwise very difficult to reconstruct after a performance incident occurs.
This simplifies connection management and reduces exposure. Pair it with a connection pooler like PgBouncer for PostgreSQL workloads that create many short-lived connections.
Cloud Logging retention is limited. Exporting to BigQuery gives you long-term query logs for capacity planning and compliance audits.
Using Terraform ensures that monitoring configuration is reproducible, version-controlled, and consistent across environments.
Monitoring availability_for_failover tells you whether the standby is ready. Periodically triggering a failover in a non-production environment validates that the path works end to end.
A spike in replication lag or CPU that coincides with a deployment often points to a new or changed query. Annotating dashboards with deployment markers makes this pattern visible immediately.
A storage growth rate that doubles month over month will breach thresholds well before you expect. Set alerts on rate of change in addition to absolute thresholds.
Monitor Google Cloud SQL with CubeAPM
CubeAPM provides full-stack observability for applications running on Google Cloud, including deep integration with Cloud SQL. Track query performance, replication lag, and infrastructure metrics in a single view alongside your application traces and logs.
Get anomaly alerts, query-level drill-down, and pre-built Cloud SQL dashboards without complex setup.
Conclusion
Google Cloud SQL monitoring covers two main areas: instance-level performance metrics (CPU, memory, disk, connections) and replication health (replica lag, network lag, failover readiness). Both are available natively through Cloud Monitoring, and both can be enriched with Query Insights for query-level analysis.
The practical approach is to start with Cloud Monitoring dashboards and alerting policies using the key metrics covered in this guide, enable Query Insights for slow query diagnosis, and evaluate a third-party tool if you need cross-stack correlation or longer retention.
Replication lag deserves dedicated attention. It is often an early signal of write pressure, resource constraints, or network issues that will eventually affect primary instance performance as well. Alert on it proactively rather than discovering it during an incident.
FAQs
1. What is the best way to monitor Google Cloud SQL performance?
Start with Cloud Monitoring for infrastructure metrics (CPU, memory, disk, connections), enable Query Insights for query-level visibility, and set alerting policies on key thresholds. Add a third-party tool like CubeAPM or Datadog if you need cross-stack correlation with application traces.
2. How do I check replication lag in Google Cloud SQL?
Monitor the Cloud Monitoring metric database/replication/replica_lag for a real-time view. For PostgreSQL, run SELECT now() – pg_last_xact_replay_timestamp() AS replication_lag; directly on the replica. For MySQL, run SHOW SLAVE STATUS\G and check Seconds_Behind_Master.
3. What causes high replication lag in Cloud SQL?
The most common causes are write-heavy workloads overwhelming the replica, CPU or disk pressure on the replica instance, large single transactions like bulk inserts, and network latency for cross-region replicas. Upgrading the replica tier, batching large writes, and enabling parallel replication are the main fixes.
4. Does Google Cloud SQL have built-in query monitoring?
Yes. Query Insights is available for MySQL and PostgreSQL and shows database load by query, average latency, and execution plan samples. Enable it per instance from the Cloud Console under Overview > Query Insights. The Enterprise Plus edition includes deeper plan analysis.
5. What alerts should I set up for Google Cloud SQL?
At minimum, alert on CPU above 80%, storage above 85% of capacity, active connections nearing max_connections, replication lag exceeding your SLA threshold, and any non-zero auto-failover request count. These cover the most common failure paths before they reach users.





