MySQL Monitoring is essential for ensuring database reliability, query efficiency, and application uptime. MySQL remains the second most popular database globally, powering millions of production systems across SaaS, fintech, and eCommerce platforms.
With growing workloads and complex architectures, many teams still struggle with fragmented monitoring. Traditional tools make it hard to correlate MySQL metrics with logs or application traces, causing long mean-time-to-resolution (MTTR) and costly downtime. Common pain points include slow query diagnosis, replication delays, and high storage costs for historical logs.
CubeAPM is the best solution for monitoring MySQL databases. It brings full-stack visibility through a unified MELT architecture, enabling teams to connect query latency with error logs and application spans instantly. With smart sampling and real-time query dashboards, CubeAPM delivers cost-predictable observability without losing diagnostic depth.
In this article, we’ll explore what MySQL monitoring means, why it matters, the key metrics you should track, and how CubeAPM provides modern MySQL monitoring.
Table of Contents
ToggleWhat Do You Mean by MySQL Monitoring?

MySQL monitoring refers to the continuous observation of a database’s performance, query optimization, and overall health tracking. It provides real-time insights into how efficiently queries are executed, how resources are utilized, and how reliably replication or clustering performs under load.
By collecting and analyzing these signals, teams can detect early warning signs—such as latency spikes, lock contention, or replication delays—and take corrective actions before they affect end-user performance or data consistency. At its core, effective MySQL database monitoring covers several key aspects that work together to ensure a healthy and high-performing database:
- Query performance metrics: Track slow queries, queries per second (QPS), and thread count to identify inefficiencies in SQL execution or concurrency limits.
- Resource utilization: Measure CPU, memory, and disk I/O usage to understand whether MySQL workloads are constrained by infrastructure or misconfigured parameters.
- Replication and cluster state: Monitor replication lag, master-slave synchronization, and cluster failover readiness to maintain data integrity and availability.
- Connection pool and deadlock analysis: Observe active connections, aborted sessions, and deadlock frequency to prevent connection saturation and optimize pool configuration.
Together, these aspects help engineers gain full performance visibility, streamline query optimization, and maintain database health, ensuring that MySQL instances consistently deliver low-latency, reliable performance across production environments.
Example: Monitoring a Payment System with MySQL
Imagine a fintech platform processing thousands of transactions per minute. During peak hours, users start reporting delayed payments. With real-time MySQL monitoring, engineers can immediately see spikes in query execution time and replication lag on the primary node.
A correlated trace in CubeAPM reveals that a new JOIN-heavy query introduced by a code update is slowing down the checkout workflow. By optimizing that query and tracking the performance delta in CubeAPM dashboards, the team restores latency to normal levels, avoiding further payment delays and customer dissatisfaction.
Why MySQL Monitoring Matters
Query latency during high traffic
Under heavy load, MySQL queries that once completed in milliseconds can suddenly stall as concurrency spikes. Even a single slow query can degrade total app performance by up to 30% during peak hours. Latency often stems from overloaded threads, exhausted connection pools, or unoptimized joins. Monitoring query time, QPS, and active thread counts ensures bottlenecks are caught before they cascade into system-wide slowdowns.
Unoptimized indexes or schema drift
As data grows or query patterns evolve, once-efficient indexes can lose relevance. Schema drift—new columns, data-type changes, or missing constraints—causes MySQL’s optimizer to pick slower execution plans. Monitoring index usage, buffer pool hit ratio, and query plan changes helps teams identify regressions early. Regularly tuning these parameters ensures optimal plan stability and prevents full table scans or excessive random I/O.
Lock contention or replication lag
Long-running transactions and hotspot updates often lead to lock waits that stall concurrent queries. At the same time, replication lag across read replicas can cause inconsistent or stale reads under write-heavy workloads. Tracking InnoDB lock waits, transaction duration, and replication thread states helps teams proactively rebalance workloads or add replicas before lag affects data freshness and end-user performance.
Disk I/O saturation or buffer pool misses
When working sets outgrow memory, MySQL resorts to disk I/O, leading to high read latency and slow queries. Monitoring I/O wait time, page flush rate, and buffer pool hit ratio exposes these inefficiencies before they become critical. According to MySQL documentation, buffer pool misses directly correlate with user-facing latency, especially when query volume surges in production. Continuous monitoring ensures balanced memory allocation and sustained throughput.
Key Metrics to Monitor in MySQL
Monitoring MySQL isn’t just about uptime; it’s about understanding why performance changes. The following categories group the most critical metrics that reveal query efficiency, resource utilization, replication stability, and system health. Keeping an eye on these helps engineers pinpoint problems before they impact applications.
Performance Metrics
These metrics show how efficiently MySQL handles queries, transactions, and active connections under different workloads.
- Queries per Second (QPS): Measures how many queries the server processes each second. A sudden drop indicates locking or connection issues, while sustained spikes suggest the need for scaling read replicas or tuning indexes. Threshold: For most production workloads, consistent QPS above 2–3x your baseline indicates a potential overload.
- Transactions per Second (TPS): Tracks how many transactions (committed or rolled back) occur per second. A fall in TPS can mean contention on tables or long-running transactions. Threshold: TPS should stay close to historical averages; a 20% drop typically signals lock contention or replication lag.
- Average Query Execution Time: Reflects the mean time taken for queries to complete. High averages can indicate missing indexes or inefficient joins. Threshold: Keep p95 execution time under 200 ms for user-facing queries to maintain application responsiveness.
- Thread Count (Threads_running): Indicates the number of active client threads executing queries. High thread counts over time can lead to CPU saturation. Threshold: Maintain fewer than 25–30 concurrent threads per core for stable performance.
Resource Utilization Metrics
These metrics help you evaluate how system resources like CPU, memory, and disk I/O affect MySQL throughput.
- CPU Utilization: Tracks how much processing power MySQL consumes. A consistently high CPU indicates heavy query parsing or sorting. Threshold: Sustained CPU usage above 80% across multiple cores suggests inefficient queries or missing indexes.
- Memory Usage (InnoDB Buffer Pool): Shows how effectively MySQL caches data in memory. Low buffer pool hit ratios mean frequent disk reads. Threshold: Keep the buffer pool hit ratio above 95% to ensure minimal I/O waits.
- Disk I/O Latency: Measures how long it takes MySQL to read or write data to disk. High latency impacts both reads and commits. Threshold: Average read/write latency should stay below 5 ms on SSDs for optimal responsiveness.
- Temporary Tables on Disk: Counts how often MySQL spills temporary tables to disk when processing large joins or sorts. Frequent spills degrade performance. Threshold: Less than 5% of temporary tables should be created on disk.
Replication and Cluster Metrics
These metrics are crucial for systems using master–replica setups or multi-node clusters to ensure consistency and redundancy.
- Replication Lag (Seconds_Behind_Source): The time delay between primary and replica databases. High lag causes stale reads and inconsistent data. Threshold: Keep lag below 1 second for most transactional workloads.
- Replica I/O and SQL Thread State: Shows whether replication threads are active or stopped. Stopped threads can silently break replication. Threshold: Both threads should remain in the Running state; any deviation needs immediate attention.
- Binlog Disk Usage: Tracks the amount of binary log data generated and stored for replication or point-in-time recovery. Unchecked growth fills disks quickly. Threshold: Retain only 3–5 days of binlogs unless longer recovery windows are required.
Error and Availability Metrics
These metrics provide visibility into failed connections, deadlocks, and the overall stability of the MySQL service.
- Aborted Connections: Counts connections closed unexpectedly due to network issues, timeouts, or client errors. Threshold: Fewer than 1% of total connections should abort during normal operation.
- Deadlocks: Indicates when concurrent transactions block each other, forcing one to roll back. Frequent deadlocks show poor transaction design. Threshold: Keep deadlocks below 0.1% of total transactions; higher rates need query refactoring.
- Uptime: Measures how long the MySQL instance has been running continuously. Frequent restarts may indicate configuration instability or hardware issues. Threshold: Aim for uptime exceeding 99.9% in production environments.
- Slow Query Log Entries: Tracks queries exceeding a predefined execution time (default 10s). Frequent slow logs pinpoint indexing gaps or inefficient SQL patterns. Threshold: Ideally, fewer than 1% of queries should appear in the slow log during peak load.
Connection and Thread Metrics
Connection stability and concurrency control ensure MySQL can handle simultaneous clients without overload.
- Connections (Threads_connected): Displays the total number of active connections to the server. Sudden spikes may indicate application connection leaks. Threshold: Maintain active connections under 80% of the configured max_connections value.
- Thread Cache Hit Ratio: Shows how often MySQL can reuse threads from its cache rather than creating new ones. Low ratios lead to CPU overhead. Threshold: Keep thread cache hit ratio above 90% for efficient session handling.
- Connection Errors: Counts failed attempts due to authentication or networking issues. A rising trend signals connection mismanagement or firewall blocks. Threshold: Less than 1% of total connection attempts should fail.
How to Monitor MySQL Databases with CubeAPM
Step 1: Install CubeAPM
Start by deploying the CubeAPM platform, which serves as the central observability backend. You can install it using Docker, Linux, or Kubernetes, depending on your environment. Follow the CubeAPM installation guide to get started.
For Kubernetes, CubeAPM provides a Helm-based installer that deploys the core services, storage, and OpenTelemetry Collector—all prerequisites for MySQL monitoring.
Once CubeAPM is up, make note of your OTLP endpoint (for example, https://ingest.cubeapm.io/v1/metrics
) and your API token, which you’ll use to send MySQL metrics and logs.
Step 2: Configure the OpenTelemetry Collector for MySQL
CubeAPM supports native OpenTelemetry ingestion, so you can monitor MySQL without proprietary agents.
Install the OpenTelemetry Collector on the same host or cluster node where MySQL runs. Then enable the MySQL receiver in your collector configuration to pull metrics from performance_schema
and information_schema
.
Example configuration snippet:
receivers:
mysql:
endpoint: "localhost:3306"
username: "monitor_user"
password: "secure_password"
collection_interval: 30s
exporters:
otlp:
endpoint: "https://ingest.cubeapm.io/v1/metrics"
headers:
Authorization: "Bearer <YOUR_CUBEAPM_TOKEN>"
service:
pipelines:
metrics:
receivers: [mysql]
exporters: [otlp]
This setup continuously streams metrics such as QPS, TPS, query latency, buffer pool hit ratio, and replication lag to CubeAPM for visualization and alerting.
Step 3: Enable Log Monitoring for Slow Queries and Errors
To correlate MySQL logs with metrics and traces, configure CubeAPM’s log ingestion pipeline. You can forward the slow query log and error log through Fluent Bit, Filebeat, or the OTEL Collector’s filelog
receiver.
Make sure each log entry includes timestamps (time
), severity, and message fields so CubeAPM can align them with metric timelines.
Example snippet:
receivers:
filelog:
include: [ "/var/log/mysql/slow.log", "/var/log/mysql/error.log" ]
start_at: beginning
service:
pipelines:
logs:
receivers: [filelog]
exporters: [otlp]
Once configured, CubeAPM will automatically parse these logs, allowing you to trace each slow query back to the corresponding transaction or application span.
Step 4: Configure Alerts for MySQL Health
Set up proactive alerting so you’re notified before performance deteriorates.
Use CubeAPM’s alerting configuration to connect via email, Slack, or PagerDuty.
Typical MySQL alert examples:
- High Query Latency: Trigger an alert if
mysql_global_status_queries
exceeds your baseline for over 5 minutes. - Replication Lag: Fire an alert if
mysql_slave_seconds_behind_master > 5
. - Buffer Pool Miss Ratio: Warn if the hit ratio drops below 95%.
CubeAPM lets you customize thresholds, aggregation windows, and severity labels directly from its dashboard.
Step 5: Verify Data and Build Dashboards
Once metrics and logs start streaming, open CubeAPM’s dashboard view.
You’ll see default MySQL charts covering:
- Query throughput and latency
- Replication and connection health
- Disk I/O and buffer pool utilization
- Slow query trends
You can combine these metrics with application traces to understand which service or API endpoint generated a problematic SQL statement.
Use the dashboard builder to add widgets, define filters (e.g., by database
or host
), and visualize trends over time. Learn more in the infra monitoring documentation.
Step 6: Validate and Tune
Finally, confirm that data freshness, log correlation, and alerts are working end-to-end.
Check:
- Metrics appear in the MySQL Overview Dashboard
- Logs are timestamp-aligned with trace spans
- Alerts trigger and resolve correctly
CubeAPM’s unified MELT (Metrics, Events, Logs, Traces) design ensures MySQL issues are visible across every layer — from query latency and I/O metrics to application traces.
Interpreting MySQL Metrics in CubeAPM
Once your data starts flowing into CubeAPM, the real magic begins, turning raw metrics into actionable insights. Every spike, dip, or lag in your charts tells a story about what’s happening inside MySQL.
Understanding Spikes in Latency and QPS
When you notice a spike in latency, for instance, it usually points to a surge in query complexity or a resource bottleneck. In CubeAPM, jump into the “Query Latency Over Time” widget. If latency spikes align with an increase in QPS (Queries per Second), that indicates your server is getting overwhelmed — often by inefficient queries or thread saturation. On the other hand, if latency rises while QPS stays stable, look deeper into locking, disk I/O, or buffer pool misses.
Reading Buffer Pool Misses and Resource Patterns
Speaking of the buffer pool, a high miss ratio (below 95%) means MySQL is fetching data from disk more often than memory. CubeAPM visualizes this in the Buffer Pool Hit Ratio chart, where a falling line signals you may need more memory allocation or index optimization. Watch for repeating patterns — like a steady nightly dip — which could reveal batch jobs or analytics queries hogging resources.
Correlating Metrics with Traces and Logs
Now, here’s where CubeAPM really flexes, correlating MySQL metrics with traces and logs. When latency spikes, click into the trace view to see which API call or service triggered the issue. The logs pane will show you if that trace also logged a MySQL error or timeout. This end-to-end correlation is what makes CubeAPM special, it bridges database telemetry with application-level performance.
Example: Tracing a Slow Checkout Query
Let’s say your eCommerce app’s checkout endpoint starts lagging. In CubeAPM, you notice a latency jump in MySQL queries and an increase in Threads_running
. Drilling into the trace, you see the POST /checkout
span linked to a specific SQL query:
SELECT * FROM orders WHERE user_id = ? ORDER BY created_at DESC;
The trace shows the query running for 2.4 seconds, while the Index Usage chart reveals it’s performing a full table scan. That’s your culprit — a missing index on the user_id column. After adding the index, CubeAPM instantly shows the latency drop and QPS normalization — a perfect feedback loop for optimization.
Viewing Everything in CubeAPM’s Unified MELT Dashboard
With CubeAPM’s unified MELT dashboard, you can view MySQL metrics side by side with application traces, frontend RUM data, and infrastructure stats. That’s how you move from “Why is it slow?” to “Here’s exactly which query and component caused it”, all within a single pane of glass.
Real-Life Example: How CubeAPM Helped Stabilize a MySQL-Driven E-Commerce Platform
The Challenge
An e-commerce startup running a high-traffic marketplace noticed periodic checkout delays during weekend sales. Average response times would spike from 250 ms to over 2.5 seconds, and cart abandonment climbed nearly 18%. The DevOps team’s existing monitoring tool showed host-level CPU and memory, but no visibility into which MySQL queries were slowing things down or why. Logs were scattered across nodes, replication lag wasn’t alerting properly, and the root cause kept slipping through the cracks.
The CubeAPM Solution
After deploying CubeAPM’s OpenTelemetry-native MySQL monitoring, the team instrumented their production MySQL cluster using the OTEL Collector configuration. Metrics like QPS, transaction latency, buffer pool hit ratio, and replication delay were streamed to CubeAPM’s dashboard. At the same time, slow query logs were ingested through CubeAPM’s log pipeline, enabling full correlation between MySQL performance and checkout service traces.
Within minutes, CubeAPM’s trace view linked the checkout latency spikes to a specific API endpoint — /checkout/confirm
. The related SQL query was clearly visible inside the trace:
SELECT * FROM orders WHERE user_id = ? ORDER BY created_at DESC;
The system flagged this query as a high-frequency slow query, consuming over 40% of total DB time under load.
Fixes and Optimization Steps
Using CubeAPM’s unified MELT dashboard, the engineers drilled into the Index Usage chart and found that the user_id
column lacked an index, forcing a full table scan for every checkout. They created a composite index on (user_id
, created_at
) and reran the load tests. CubeAPM instantly reflected a drop in query latency from 2.4s to 180ms, confirmed by the Query Latency Over Time graph.
The team also set up alert rules via CubeAPM’s alerting configuration, one for replication lag above 3 seconds and another for slow query percentage exceeding 2%. Email notifications were linked to their DevOps Slack channel for instant response.
The Result
Post-optimization, the checkout flow stabilized with consistent sub-300 ms latency, and overall database CPU utilization dropped by 27%. Query throughput increased by 1.8×, and replication lag stayed below 1 second even during flash sales.
More importantly, the team gained complete visibility into how each query affected real-world user transactions. With CubeAPM, they no longer guess, they observe, correlate, and act in real time.
Verification Checklist & Example Alert Rules for MySQL Monitoring with CubeAPM
Once CubeAPM is fully connected, you’ll want to verify that all telemetry, metrics, logs, and traces, is flowing correctly. This checklist helps confirm complete visibility, while the alert rules ensure you’re instantly notified when performance starts to slip.
Verification Checklist for MySQL Monitoring
- MySQL Exporter Connection: Confirm the OpenTelemetry Collector is scraping metrics from
performance_schema
without authentication errors. - Metric Ingestion: Check CubeAPM’s “MySQL Overview” dashboard — metrics like
mysql_global_status_queries
andmysql_innodb_buffer_pool_pages_total
should appear with live updates. - Slow Query Logs: Verify that slow and error logs are visible in the Logs tab, with fields like timestamp, duration, and SQL statement.
- Trace Correlation: Open a trace in CubeAPM and confirm it includes database spans with
db.system: mysql
anddb.statement
attributes. - Buffer Pool Monitoring: Inspect the buffer pool charts for hit ratios above 95%; consistent drops may indicate insufficient memory.
- Replication Lag Visibility: Validate that the metric
mysql_slave_seconds_behind_master
ormysql_replica_lag_seconds
is updating for replicas. - Alert Notifications: Test email or Slack notifications to confirm alert rules are firing correctly.
- Dashboard Validation: Review dashboards for query throughput, latency, and replication lag over 24 hours to establish healthy baselines.
Example Alert Rules for MySQL Monitoring
1. High Query Latency Alert
Triggers when average query execution time rises above your baseline, often caused by missing indexes or inefficient joins.
alert: High_MySQL_Query_Latency
expr: avg_over_time(mysql_global_status_queries[5m]) > 1000
for: 5m
labels:
severity: warning
annotations:
summary: "High MySQL query load detected"
description: "Average query rate exceeds baseline for 5 minutes — investigate potential slow queries or index issues."
2. Replication Lag Alert
Notifies when replicas fall behind the primary, helping prevent stale reads and data inconsistency.
alert: MySQL_Replication_Lag
expr: mysql_slave_seconds_behind_master > 5
for: 2m
labels:
severity: critical
annotations:
summary: "MySQL replication lag detected"
description: "Replica is more than 5 seconds behind the master — check replication health and thread states."
3. Low Buffer Pool Hit Ratio Alert
Warns when MySQL starts serving data from disk instead of memory, indicating poor caching or insufficient buffer size.
alert: Low_Buffer_Pool_Hit_Ratio
expr: (mysql_innodb_buffer_pool_reads / mysql_innodb_buffer_pool_read_requests) > 0.05
for: 3m
labels:
severity: warning
annotations:
summary: "Buffer pool hit ratio dropped below 95%"
description: "Frequent disk reads detected — tune InnoDB buffer pool size or review query patterns."
Conclusion
Proactive MySQL monitoring is the difference between reacting to outages and preventing them altogether. By continuously tracking performance metrics, replication health, and slow queries, teams can ensure consistent uptime, faster response times, and a smoother user experience. When every query matters, visibility isn’t optional — it’s your first line of defense against performance degradation and costly downtime.
CubeAPM makes this process effortless. With its OpenTelemetry-native architecture, unified MELT coverage (Metrics, Events, Logs, Traces), and flat-rate pricing, CubeAPM empowers DevOps teams to correlate MySQL performance with application behavior in real time — without the complexity or unpredictable costs of traditional APM tools.
Start monitoring MySQL with CubeAPM today for complete visibility, smarter diagnostics, and cost-predictable observability that scales with your business.