CubeAPM
CubeAPM CubeAPM

Azure Cache for Redis Monitoring: Performance, Metrics, Alerts, and Troubleshooting

Azure Cache for Redis Monitoring: Performance, Metrics, Alerts, and Troubleshooting

Table of Contents

A Redis cache you do not monitor is a Redis cache that will eventually surprise you. Maybe memory fills up silently at 3 AM. Maybe latency climbs during a product launch, and no one notices until customers start complaining. Azure Cache for Redis is powerful, but it is also a live component in your application stack. When it misbehaves, your application feels it immediately.

This guide shows you exactly how to set up Azure Cache for Redis monitoring, which metrics matter most, how to configure alerts before problems escalate, and how third-party observability tools extend what Azure Monitor provides natively.

Key Takeaways

  • Azure Monitor is the native way to collect Redis metrics including memory, CPU, latency, and hit/miss rates.
  • Track Used Memory Percentage, Cache Hit Ratio, Server Load, and Cache Latency as your four primary health signals.
  • Set proactive alerts on memory thresholds (80%) and cache hit ratio drops to prevent silent degradation.
  • Azure Monitor Workbooks give you a single interactive dashboard across all your Redis instances.
  • Third-party tools like Dynatrace, CubeAPM, New Relic, Netdata, ManageEngine, and Site24x7 extend visibility with cross-stack correlation.
  • Evicted keys and expired keys together reveal memory pressure patterns that memory metrics alone can miss.

What Is Azure Cache for Redis?

Azure Cache for Redis is Microsoft’s fully managed, in-memory caching service built on the open-source Redis engine. It stores frequently accessed data in memory so applications can retrieve it in microseconds rather than querying a database repeatedly. Common use cases include session state management, API response caching, real-time leaderboards, and rate limiting.

Microsoft offers several tiers: Basic, Standard, Premium, and Enterprise. Premium and Enterprise tiers include clustering, geo-replication, and data persistence. As of 2025, Microsoft announced a retirement timeline for Azure Cache for Redis SKUs and recommends migrating to Azure Managed Redis for new workloads. Monitoring practices described in this article apply to both platforms.

Why Azure Cache for Redis Monitoring Matters

Without monitoring, you are operating blind. These are the most common failure modes that proper Azure Cache for Redis monitoring catches before they cause outages:

  • Memory exhaustion: When Used Memory Percentage reaches 100%, Redis either starts evicting keys or rejects new writes depending on your maxmemory policy. Both outcomes hurt application behaviour.
  • High server load: Redis is single-threaded for command processing. Sustained CPU usage above 70% causes queuing and latency spikes across all clients.
  • Cache stampede: A sudden spike in cache misses after a deployment or TTL expiry can overwhelm your database backend.
  • Connection leaks: A slow rise in connected clients that never drops back often signals a connection pool leak in application code.
  • Noisy neighbour effects: Shared-tier instances can experience interference from co-located tenants, visible as unexpected latency variance.

Key Metrics for Azure Cache for Redis Monitoring

Azure Cache for Redis exposes dozens of metrics through Azure Monitor. These are the ones that matter most for day-to-day operations:

MetricREST API NameWhat It Tells YouAlert Threshold
Used Memory %usedmemorypercentageHow full the cache is as a percentage of allocated tier memory> 80%
Cache Hit Ratiocachehits / (cachehits + cachemisses)Percentage of key lookups that returned a value< 80%
Server LoadserverLoadCPU utilization of the Redis server process> 70%
Cache LatencycacheLatencyAverage time to complete a single cache operation> 2 ms
Connected ClientsconnectedclientsNumber of active client connectionsSpike > baseline
Evicted KeysevictedkeysKeys removed due to memory pressure (maxmemory policy)> 0
Operations/SecoperationsPerSecondThroughput; useful for capacity planningNear tier limit
Cache MissescachemissesLookups for keys that do not existRising trend

Memory Metrics

Used Memory (usedmemory): The actual bytes consumed by your data. Track this alongside Used Memory RSS, which is the physical memory allocated by the OS. When RSS is significantly higher than usedmemory, memory fragmentation is wasting resources.

Used Memory Percentage (usedmemorypercentage): This is the most actionable memory metric. Alert at 80% so you have time to scale up or review eviction policy before hitting 100%.

Memory Fragmentation Ratio: Calculated as RSS divided by used memory. A ratio above 1.5 means significant fragmentation. Values above 2.0 warrant investigation or a restart during a maintenance window.

Performance Metrics

Server Load (serverLoad): CPU utilisation of the Redis server as a percentage. Sustained values above 70% are a strong signal to review slow commands or scale up your tier.

Cache Latency (cacheLatency): The single best indicator of cache health from the application perspective. This measures the average time to complete a cache operation. Any sustained rise above 2 ms for a standard GET/SET workload warrants investigation.

Operations Per Second (operationsPerSecond): Throughput of your cache. Useful for capacity planning and understanding load patterns relative to your tier limits.

Hit and Miss Metrics

Cache Hit Ratio: Calculated as cachehits divided by (cachehits plus cachemisses). For most production workloads you want this above 80%. A sustained drop is usually a sign of key expiry issues, TTL misconfiguration, or a cold-start event.

Evicted Keys (evictedkeys): Any non-zero value means Redis is actively removing data to free memory. This is expected behaviour when maxmemory-policy is set to a volatile or allkeys eviction policy, but it also signals that your cache is under memory pressure.

Expired Keys (expiredkeys): Keys removed because their TTL elapsed naturally. A sudden spike here can cause a cache miss wave and database load spike.

Connection Metrics

Connected Clients (connectedclients): Number of active client connections. A gradual, unrecovered rise typically means a connection leak. A sudden spike may indicate a retry storm from application errors.

How to Enable Azure Cache for Redis Monitoring with Azure Monitor

Azure Monitor is built into the Azure platform and requires no additional agent for Azure Cache for Redis. Metrics are collected automatically at one-minute resolution for most dimensions.

Step 1: Access Metrics in the Azure Portal

  • Sign in to the Azure portal at portal.azure.com.
  • Navigate to your Azure Cache for Redis resource.
  • Select Monitoring from the left navigation pane.
  • Choose Metrics to open the Azure Monitor Metrics Explorer for that resource.
  • Use the Metric dropdown to select any metric from the Microsoft.Cache/redis namespace.

You can plot multiple metrics on the same chart, change the aggregation window, and pin charts to a shared Azure Dashboard for team-wide visibility.

Step 2: Enable Diagnostic Settings to Export Metrics and Logs

By default, Azure Monitor retains metric data for 93 days. For longer retention or cross-service analysis, configure Diagnostic Settings to export data to a Log Analytics Workspace, a Storage Account, or an Event Hub.

  • In your Redis resource, go to Monitoring > Diagnostic settings.
  • Click Add diagnostic setting.
  • Select the log categories you want (for example, ConnectedClientList) and tick AllMetrics.
  • Choose a destination: Log Analytics Workspace is recommended for querying with KQL.

Step 3: Use Azure Monitor Workbooks for a Unified View

Azure Monitor Insights for Azure Cache for Redis provides two pre-built Workbooks that give you an interactive, multi-resource view without writing any queries.

Azure Cache for Redis Resource Overview: Combines the most commonly used metrics so that cache health and performance can be assessed at a glance across all instances in your subscription.

Geo-Replication Dashboard: Pulls geo-replication health and status metrics from both the geo-primary and geo-secondary cache instances. This dashboard is recommended for geo-replicated deployments because some metrics are emitted from only one of the two instances.

To open Workbooks, search for Monitor in the Azure portal, select Azure Cache for Redis under Insights Hub, then navigate to the Workbooks tab.

Setting Up Alerts for Azure Cache for Redis

Metrics are only useful if someone is notified when they cross a critical threshold. Azure Monitor Alert Rules let you define conditions and route notifications to email, SMS, webhook, or action groups.

Recommended Alert Rules

  • Used Memory Percentage greater than 80%: Gives you time to scale up or review eviction policy before the cache is completely full.
  • Cache Hit Ratio below 80%: An early warning that cache effectiveness is degrading, often before application latency is visible.
  • Server Load greater than 70% sustained over 5 minutes: Indicates the Redis thread is under stress and latency may be rising.
  • Connected Clients exceeding your baseline by 50%: Signals a possible connection leak or retry storm.
  • Evicted Keys greater than 0: Alerts you to active memory pressure so you can investigate before data loss at scale.

How to Create an Alert Rule

  • In your Redis resource, go to Monitoring > Alerts.
  • Click Create > Alert rule.
  • Under Condition, select a metric signal (for example, Used Memory Percentage).
  • Set the Threshold value and Aggregation granularity (5 minutes is a good default for most metrics).
  • Under Actions, link an Action Group that defines who gets notified and how.
  • Save the rule and verify it appears in the Alerts list.

Baseline Your Cache: Performance Testing Before You Monitor

Alerts are most effective when you know what normal looks like. Microsoft recommends using redis-benchmark or memtier-benchmark to establish a latency and throughput baseline before you go to production.

Run your benchmark from a client virtual machine in the same Azure region as your cache instance. Ensure the VM has at least as much compute and bandwidth as the cache tier you are testing. Use the -p flag if you are connecting on a non-default port such as 6380 for TLS or 10000 for the Enterprise tier.

Key parameters to record from a baseline run: average latency, p99 latency, operations per second at your expected concurrency level, and CPU utilisation on the Redis server during the load. These numbers become the foundation for your alert thresholds.

Third-Party Tools for Azure Cache for Redis Monitoring

Azure Monitor is the authoritative source for Redis metrics on Azure, but third-party observability platforms add correlation with application traces, cross-cloud comparisons, and pre-built anomaly detection. Here are the most widely used options:

CubeAPM

CubeAPM is a full-stack APM tool that monitors Azure Cache for Redis alongside your application traces and infrastructure metrics in a single platform. It collects Redis metrics including memory usage, cache hit ratio, latency, connected clients, and evicted keys, and correlates them directly with distributed traces from your application services. When a Redis latency spike causes a downstream API to slow down, CubeAPM shows you the complete cause-and-effect chain without requiring you to switch between tools. It is designed for teams that want deep Redis observability without the complexity or cost of enterprise APM platforms.

Dynatrace

Dynatrace pulls Azure Cache for Redis metrics through Azure Monitor integration and correlates them with distributed traces from its OneAgent. If a cache latency spike is causing a downstream service to slow down, Dynatrace can show you the full cause-and-effect chain automatically.

New Relic

New Relic’s Azure Redis Cache integration polls metrics every 5 minutes by default and surfaces them in infrastructure dashboards. You can query cache data using NRQL alongside application performance data in the same platform, making it straightforward to correlate a cache miss surge with an application error rate increase.

ManageEngine Applications Manager

ManageEngine Applications Manager provides dedicated Azure Cache for Redis monitoring with out-of-the-box dashboards for memory, connection, and hit/miss metrics. It also includes availability monitoring and integrates with its broader IT operations management suite for incident management workflows.

Site24x7

Site24x7 offers an Azure Cache for Redis monitoring integration that collects metrics at one-minute granularity. It provides threshold-based alerting and integrates with on-call notification workflows, making it suitable for teams that already use Site24x7 for website and infrastructure monitoring.

Netdata

Netdata’s Azure Monitor collector discovers Redis instances automatically via Azure Resource Graph queries at startup. It uses the Azure Monitor Metrics batch API to collect metrics including per-shard breakdowns for hit rate, connected clients, server load, keys, and throughput. Authentication is handled through Microsoft Entra ID using service principal or managed identity credentials.

Troubleshooting Common Azure Cache for Redis Performance Problems

Latency that rises without a corresponding rise in server load usually points to network issues, client-side timeouts, or connection pool exhaustion. Check the geographic distance between your application and the cache instance, verify you are using persistent connections, and review your application’s connection pool size.

If usedmemorypercentage is high but totalkeys is low, memory fragmentation is the likely cause. Check your memory fragmentation ratio. If it is significantly above 1.5, consider triggering an active defragmentation or scheduling a restart during low-traffic hours.

A deployment that invalidates cache keys intentionally or through key naming changes causes a sudden miss surge. The database absorbs all the traffic that the cache was handling. Warm your cache after deployments using a read-through or pre-population strategy, and track expiredkeys around deployment windows.

If evictedkeys rises during ordinary traffic (not a traffic spike), your cache tier is undersized for your dataset. Either increase the tier or review whether you are caching items with unnecessarily long TTLs that prevent natural key turnover.

Stop Guessing. Start Monitoring.

CubeAPM gives you deep visibility into Azure Cache for Redis performance alongside your full application stack. Correlate cache latency spikes, memory pressure, and hit ratio drops directly with application traces — no context switching required.

Try CubeAPM Free   →

Conclusion

Effective Azure Cache for Redis monitoring is not about tracking every available metric. It is about watching the four or five signals that tell you whether your cache is healthy, setting alert thresholds before problems happen, and having a clear path from alert to diagnosis.

Start with Used Memory Percentage, Cache Hit Ratio, Server Load, and Cache Latency. Enable Diagnostic Settings to persist data beyond 93 days. Use Azure Monitor Workbooks for a team-visible dashboard. Add alert rules for the thresholds in this guide. Then, as your use of Redis grows, layer in a third-party tool to correlate cache behaviour with the rest of your application stack.

A well-monitored cache is a cache you can trust. And a cache you can trust is one that quietly does its job while you focus on shipping features instead of firefighting outages.

Disclaimer: Metric names, thresholds, and portal navigation paths are based on Azure Monitor documentation as of 2025 and may change as Microsoft updates the Azure platform. Always refer to the official Microsoft Azure documentation for the most current guidance. Mentions of third-party tools do not constitute endorsements.

FAQs

1. What is the most important metric to monitor in Azure Cache for Redis?

Used Memory Percentage and Cache Hit Ratio are the two most critical metrics. Used Memory Percentage tells you how close the cache is to exhaustion, while Cache Hit Ratio tells you whether the cache is actually doing its job. Pair both with Cache Latency for a complete picture of cache health.

2. How do I access Azure Cache for Redis metrics?

Metrics are available natively through Azure Monitor. In the Azure portal, navigate to your Redis resource and select Monitoring > Metrics. You can also access the pre-built dashboards in Azure Monitor Insights Hub by searching for Monitor and selecting Azure Cache for Redis.

3. What causes a low cache hit ratio in Azure Cache for Redis?

A low cache hit ratio is usually caused by keys expiring before they are reused (TTL too short), cold-start conditions after a deployment, key naming inconsistencies between writers and readers, or caching data that is accessed too infrequently to benefit from caching. Use the evictedkeys and expiredkeys metrics alongside hit ratio to diagnose the root cause.

4. How do I set up alerts for Azure Cache for Redis?

In the Azure portal, go to your Redis resource, then Monitoring > Alerts > Create > Alert rule. Select a metric signal, define a threshold and aggregation window, and link an Action Group to specify who gets notified. Recommended starting thresholds: Used Memory Percentage above 80%, Server Load above 70%, and any non-zero Evicted Keys.

5. Should I use a third-party monitoring tool alongside Azure Monitor?

Azure Monitor is sufficient for basic Redis monitoring. Third-party tools like Dynatrace, New Relic, or Netdata add value when you need to correlate Redis performance with application traces, manage multiple cloud providers in a single platform, or access pre-built anomaly detection and runbook automation. If your team already uses one of these platforms, the Redis integration is usually a one-click addition.

×
×