CubeAPM
CubeAPM CubeAPM

Valkey Memory Usage Monitoring: Setup, Metrics & Best Practices

Valkey Memory Usage Monitoring: Setup, Metrics & Best Practices

Table of Contents

Valkey is a high-performance, in-memory key-value database built to handle millions of operations per second. But running everything in RAM creates a hard constraint: when memory fills up, Valkey either starts evicting data or refuses new writes entirely. A production cluster that hits maxmemory can cascade into service failures within seconds, especially if the eviction policy is misconfigured or if memory usage climbs faster than expected during traffic spikes.

According to the CNCF’s 2024 Annual Survey, 76% of organizations now use containers in production, and many rely on in-memory data stores like Valkey for caching, session storage, and real-time analytics. As workloads scale and cluster sizes grow, memory monitoring becomes critical to maintaining uptime, performance, and cost efficiency.

This guide covers how Valkey memory usage works, what metrics to track, how to set up effective alerts, and which tools make monitoring easier at scale.

What Is Valkey Memory Usage Monitoring

Valkey memory usage monitoring is the practice of tracking how much RAM a Valkey instance or cluster is consuming, understanding what data structures and keys are using that memory, and setting up alerts to catch memory pressure before it causes evictions, OOM errors, or service outages.

Unlike disk-based databases that can gracefully degrade as storage fills up, in-memory databases like Valkey hit a hard limit defined by the maxmemory configuration directive. Once that limit is reached, Valkey’s behavior depends entirely on the eviction policy you’ve set. A poorly chosen policy can lead to silent data loss (evicting keys you expected to persist) or service refusal (blocking writes when eviction is disabled).

Valkey memory monitoring answers three questions:

  1. How much memory is the instance using right now?
  2. What is consuming that memory (keys, metadata, fragmentation)?
  3. When will the instance hit maxmemory, and what will happen when it does?

Effective monitoring combines real-time metrics collection, alerting on usage thresholds, and periodic analysis of key-level memory consumption to spot inefficiencies like large keys, expired data still holding memory, or fragmentation overhead.

How Valkey Memory Usage Works

Valkey stores all data in RAM. Every key, value, and associated metadata (key name, expiration timestamp, data structure overhead) contributes to total memory usage. The reported memory footprint includes not just the raw data but also internal allocator overhead and fragmentation.

Memory allocators and fragmentation

Valkey typically uses jemalloc as its memory allocator. This allocator optimizes for performance but can introduce fragmentation, where allocated memory exceeds the actual data size. For example, a key requiring 56 bytes might consume 64 bytes after allocator rounding. Over time, as keys are created and deleted, fragmentation can grow, making reported memory usage higher than the sum of all key sizes.

The INFO MEMORY command shows this breakdown:

valkey-cli -a <password> INFO MEMORY

Key fields in the output:

  • used_memory: total bytes allocated by Valkey (including data and overhead)
  • used_memory_rss: resident set size (actual RAM used by the process)
  • used_memory_peak: highest memory usage since server start
  • mem_fragmentation_ratio: ratio of RSS to used memory (values above 1.5 indicate significant fragmentation)

Maxmemory policy and eviction behavior

The maxmemory directive sets an upper limit on memory usage. When that limit is hit, Valkey applies the configured maxmemory-policy to decide what happens next. Common policies include:

  • noeviction: refuse new writes, return errors (safest for data you cannot afford to lose)
  • allkeys-lru: evict least recently used keys across the entire dataset
  • volatile-lru: evict LRU keys only among those with an expiration set
  • allkeys-lfu: evict least frequently used keys
  • volatile-ttl: evict keys with the shortest TTL remaining

Choosing the wrong policy can cause silent data loss or service degradation. For example, using volatile-lru in a cache where most keys have no expiration means eviction fails and writes start getting rejected even though memory is not truly full.

Key-level memory consumption

Not all keys consume memory equally. A small string might use 60 bytes, while a hash with 10,000 fields can consume megabytes. The MEMORY USAGE command reports the exact memory footprint of a specific key:

valkey-cli -a <password> MEMORY USAGE mykey

For nested data structures like hashes or sorted sets, the optional SAMPLES parameter controls how many nested elements are inspected to estimate total size:

valkey-cli -a <password> MEMORY USAGE myhash SAMPLES 100

Regularly profiling top keys by size helps identify memory hogs and opportunities for optimization.

Key Metrics to Track for Valkey Memory Monitoring

Effective Valkey memory monitoring requires tracking both absolute memory usage and the factors that influence it. The following metrics provide the clearest signal.

Used memory and maxmemory

Metric: used_memory and maxmemory Why it matters: These two numbers define your headroom. If used_memory approaches maxmemory, evictions or write failures are imminent. How to monitor: Poll INFO MEMORY every 10 to 60 seconds and calculate memory utilization percentage: (used_memory / maxmemory) 100. Alert threshold:* Alert when utilization exceeds 80% to give time for scaling or cleanup before hitting the limit.

Memory fragmentation ratio

Metric: mem_fragmentation_ratio Why it matters: High fragmentation means Valkey is holding more RAM than it needs for actual data. A ratio above 1.5 signals inefficiency. How to monitor: Calculate mem_fragmentation_ratio = used_memory_rss / used_memory from INFO MEMORY. Alert threshold: Alert when fragmentation exceeds 1.5. Mitigation requires restarting the instance or running active defragmentation if supported.

Evicted keys and OOM errors

Metric: evicted_keys (from INFO STATS) Why it matters: A rising eviction count indicates memory pressure. In a cache, some evictions are expected. In a primary data store, any eviction is a red flag. How to monitor: Track the evicted_keys counter over time and calculate eviction rate per second. Alert threshold: Alert on any evictions if your policy is noeviction. For LRU/LFU policies, alert when eviction rate spikes unexpectedly.

Key expiration metrics

Metric: expired_keys (from INFO STATS) Why it matters: Expired keys that are not immediately evicted still hold memory until Valkey’s lazy expiration cycle removes them. A large backlog of expired keys can inflate memory usage. How to monitor: Track expired_keys and compare to total key count. A high ratio suggests expiration is not keeping up with key creation. Alert threshold: Alert if expired_keys grows faster than expected or if memory usage stays high despite many expirations.

Peak memory usage

Metric: used_memory_peak Why it matters: Tracks the highest memory usage since the instance started. Useful for capacity planning and understanding traffic spike impact. How to monitor: Compare current used_memory to used_memory_peak to see how close you are to historical highs.

Memory overhead

Metric: used_memory_overhead (from INFO MEMORY) Why it matters: Shows memory consumed by internal structures (client buffers, replication buffers, AOF buffers) rather than actual data. High overhead can indicate configuration issues. How to monitor: Track overhead as a percentage of total used memory. Alert threshold: Alert if overhead exceeds 20% of used memory, signaling configuration tuning is needed.

How to Set Up Valkey Memory Monitoring

Setting up effective memory monitoring for Valkey involves configuring the instance to expose metrics, collecting those metrics with a monitoring tool, and creating alerts based on the thresholds that matter for your workload.

Step 1: Enable Valkey INFO command access

The INFO MEMORY and INFO STATS commands are the primary data sources for Valkey memory metrics. Ensure your monitoring tool has credentials to execute these commands. If Valkey requires authentication, provide the password:

valkey-cli -a <password> INFO MEMORY

For production clusters, restrict INFO command access to monitoring tools only using ACLs.

Step 2: Collect metrics using a monitoring agent

Most infrastructure monitoring tools support Valkey via built-in exporters or plugins. Common approaches include:

Prometheus with Valkey Exporter: The Valkey Exporter for Prometheus scrapes INFO output and exposes it as Prometheus metrics. Deploy the exporter as a sidecar or standalone service, configure it with your Valkey instance endpoint and password, then point Prometheus at the exporter’s metrics endpoint.

Native monitoring tool integrations: Tools like CubeAPM, Datadog, and Dynatrace offer native Valkey integrations that collect memory metrics, parse INFO output, and surface pre-built dashboards. CubeAPM’s infrastructure monitoring connects to Valkey instances via OpenTelemetry or Prometheus exporters and correlates memory usage with application traces and logs, making it easier to link memory spikes to specific workloads or queries.

StatsD or OpenTelemetry: For custom telemetry pipelines, write a script to poll INFO MEMORY every 60 seconds and push metrics to StatsD or OpenTelemetry Collector.

Step 3: Visualize memory usage trends

Create dashboards that show:

  • Memory utilization percentage over time
  • Evicted keys and expired keys per second
  • Fragmentation ratio
  • Peak memory vs. current memory

Visualizing these metrics together helps identify patterns. For example, a spike in evictions that correlates with a fragmentation ratio above 1.5 suggests defragmentation is needed, not just more RAM.

Step 4: Set up alerts for memory thresholds

Alerts should trigger before memory becomes critical. Recommended alert conditions:

  • Memory utilization exceeds 80% for 5 consecutive minutes
  • Fragmentation ratio exceeds 1.5
  • Evicted keys counter increases by more than 100 in 1 minute (for noeviction policies)
  • used_memory_rss grows faster than used_memory (indicates runaway fragmentation)

Route alerts to Slack, PagerDuty, or email depending on severity. Critical memory alerts should wake someone up. Warning alerts should create a ticket.

Step 5: Monitor key-level memory usage

Aggregate metrics show the overall picture but miss inefficiencies at the key level. Periodically run the MEMORY USAGE command on your largest keys or use tools that scan the keyspace and report top keys by memory consumption. This helps identify:

  • Oversized keys that should be split
  • Expired keys still holding memory
  • Data structures that could be optimized (e.g., switching from a hash to a string for small objects)

Best Practices for Valkey Memory Management

Monitoring memory is only useful if you act on the data. These practices help maintain stable memory usage in production.

Set a realistic maxmemory limit

Do not configure maxmemory equal to total system RAM. Leave headroom for OS operations, monitoring agents, and memory fragmentation. A common guideline: set maxmemory to 70-80% of available RAM. For a server with 16 GB RAM, set maxmemory to 12 GB.

Choose the right eviction policy for your use case

If Valkey is a cache where all data can be safely evicted, use allkeys-lru or allkeys-lfu. If Valkey stores critical session data with TTLs set, use volatile-lru to evict only expiring keys. If you cannot afford to lose any data, use noeviction and treat memory alerts as critical failures requiring immediate intervention (scaling up or archiving data).

Monitor memory trends, not just snapshots

A one-time check of memory usage misses the bigger picture. Track memory usage over hours and days to identify trends. Is memory growing linearly? Does it spike during business hours and drop at night? Understanding patterns helps with capacity planning and prevents surprise outages.

Use active defragmentation carefully

Valkey supports active defragmentation (activedefrag yes in the config), which compacts memory in the background to reduce fragmentation. This feature uses CPU cycles and can impact performance during defragmentation runs. Enable it only if fragmentation is persistently above 1.5 and monitor CPU impact closely.

Regularly audit large keys

Large keys consume disproportionate memory and can cause latency spikes when accessed. Use the MEMORY USAGE command or scanning tools to identify keys larger than 1 MB. Consider splitting large hashes or lists into smaller chunks or archiving infrequently accessed data.

Plan for memory growth

If memory usage is trending upward, either add more RAM, scale horizontally (add more Valkey instances), or reduce data retention (lower TTLs, delete old keys). Waiting until memory hits 95% before acting leaves no margin for traffic spikes.

Tools for Valkey Memory Monitoring

Several tools specialize in monitoring Valkey memory usage at scale, each with different trade-offs in deployment complexity, cost, and feature depth.

CubeAPM

Deployment: Self-hosted (vendor-managed) or on-premises OTel support: Native Pricing: $0.15/GB ingested, unlimited users

CubeAPM’s infrastructure monitoring tracks Valkey memory metrics alongside application traces and logs in a single platform. It connects to Valkey via Prometheus exporters or OpenTelemetry and surfaces memory utilization, fragmentation, evictions, and key-level insights in pre-built dashboards. Because CubeAPM runs inside your VPC or on-premises, all telemetry stays within your infrastructure with no egress fees or third-party data transfer.

CubeAPM correlates Valkey memory spikes with the application requests that triggered them, making it easier to trace a sudden memory surge back to a specific API endpoint or batch job. Alerts are context-aware: when memory usage crosses 80%, CubeAPM surfaces related metrics like eviction rate and fragmentation ratio in the same alert payload.

Best for: Teams that need unified observability across Valkey, Kubernetes, and application layers without sending telemetry data outside their cloud.

Datadog

Deployment: SaaS OTel support: Strong Pricing: Infrastructure monitoring starts at $18/host/month

Datadog’s Valkey integration collects memory metrics via the Datadog Agent and the built-in Valkey check. It surfaces memory usage, evictions, fragmentation, and replication lag in a unified dashboard alongside other infrastructure metrics. Datadog’s anomaly detection can alert when memory usage deviates from historical patterns, reducing false positives during expected traffic spikes.

The main drawback is cost at scale. On a 50-node Valkey cluster, infrastructure monitoring alone costs $900/month before adding APM, logs, or custom metrics. AWS egress fees for sending telemetry to Datadog’s SaaS add roughly $0.10/GB.

Best for: Teams already using Datadog for broader infrastructure monitoring who need Valkey metrics in the same platform.

Prometheus with Grafana

Deployment: Self-hosted OTel support: Native (Prometheus is part of the OTel ecosystem) Pricing: Free (open source)

Prometheus scrapes Valkey metrics via the Valkey Exporter and stores them in a time-series database. Grafana visualizes the metrics and supports alerting via Alertmanager. This stack is free and fully customizable but requires teams to manage their own Prometheus and Grafana instances, configure retention policies, and tune query performance as metric cardinality grows.

For teams already running Prometheus for Kubernetes monitoring, adding Valkey metrics is straightforward. But for teams without existing Prometheus expertise, the operational overhead can be high.

Best for: Teams with Prometheus expertise who want full control and no vendor lock-in.

New Relic

Deployment: SaaS OTel support: Strong Pricing: $0.40/GB beyond 100 GB free tier

New Relic’s infrastructure monitoring integrates with Valkey through the New Relic Infrastructure Agent and the Valkey on-host integration. It tracks memory usage, key metrics, and replication health in real time. Alerts can be configured to trigger when memory crosses defined thresholds or when fragmentation exceeds safe levels.

New Relic’s full-platform user model charges per user rather than per host, which can be more cost-effective for small teams but expensive as teams grow. The $0.40/GB ingestion fee applies to all telemetry (logs, metrics, traces), making high-volume Valkey monitoring costly.

Best for: Teams already using New Relic for application performance monitoring who want to consolidate Valkey metrics into the same platform.

Dynatrace

Deployment: SaaS or on-premises OTel support: Strong Pricing: Host-based, contact for quote

Dynatrace’s Valkey monitoring uses the OneAgent to collect memory metrics, detect anomalies, and correlate Valkey performance with application health. Its AI engine (Davis) can identify memory leaks or fragmentation patterns without manual threshold tuning. Dynatrace supports both SaaS and on-premises deployment, making it suitable for regulated industries.

The cost is high for smaller teams. Dynatrace pricing is typically enterprise-focused, with annual contracts starting in the tens of thousands of dollars.

Best for: Large enterprises that need AI-driven anomaly detection and can afford Dynatrace’s pricing.

Common Valkey Memory Issues and How to Diagnose Them

Even with monitoring in place, teams encounter recurring memory problems. These are the most common issues and how to fix them.

Memory usage grows continuously despite evictions

Symptom: used_memory keeps rising even though evicted_keys is increasing. Cause: Fragmentation or expired keys not being cleaned up fast enough. Fix: Check mem_fragmentation_ratio. If above 1.5, restart the instance or enable active defragmentation. If expired_keys is high, reduce key expiration intervals or increase the expiration cycle frequency.

Memory hits maxmemory and writes start failing

Symptom: Applications receive OOM errors or write commands return (error) OOM. Cause: maxmemory-policy is set to noeviction and memory is full. Fix: Either increase maxmemory, delete unused keys, or switch to an eviction policy like allkeys-lru if data loss is acceptable.

Memory usage spikes during traffic surges and does not drop

Symptom: Memory usage jumps 30% during peak traffic and stays elevated afterward. Cause: Large keys created during the spike are not expiring or being evicted. Fix: Audit keys created during the spike with MEMORY USAGE. Set TTLs on temporary data or manually delete keys no longer needed.

Fragmentation ratio exceeds 2.0

Symptom: used_memory_rss is twice used_memory. Cause: High churn of keys (many creates and deletes) leading to allocator fragmentation. Fix: Restart the instance to defragment memory. For long-term mitigation, enable active defragmentation or reduce key churn by batching operations.

Disclaimer: The information in this article reflects the latest details available at the time of publication and may change as technologies and products evolve. Features, pricing, and plan limits can change over time. Always verify the latest information directly with the vendor before making purchasing or deployment decisions.

Frequently Asked Questions

What is the MEMORY USAGE command in Valkey?

The MEMORY USAGE command reports how many bytes a specific key and its value consume in RAM, including allocator overhead. Use it to identify large keys that are consuming disproportionate memory.

How do I check Valkey memory usage in real time?

Run the INFO MEMORY command via valkey-cli to see current memory usage, fragmentation ratio, and peak usage. For continuous monitoring, integrate Valkey with a monitoring tool like CubeAPM, Prometheus, or Datadog.

What is a safe memory utilization threshold for Valkey?

Alert when memory usage exceeds 80% of maxmemory. This leaves headroom for traffic spikes and prevents hitting the limit unexpectedly.

What causes high memory fragmentation in Valkey?

Fragmentation occurs when the memory allocator (jemalloc) holds more memory than the data requires due to internal rounding and freed memory blocks not being reused efficiently. High key churn (frequent creates and deletes) worsens fragmentation.

Can I reduce Valkey memory usage without deleting data?

Yes. Compress large values before storing them, use more memory-efficient data structures (e.g., strings instead of hashes for small objects), and enable active defragmentation to reclaim fragmented memory.

What is the best maxmemory policy for a Valkey cache?

For a pure cache where all data can be safely evicted, use allkeys-lru or allkeys-lfu. For a cache with TTL-based expiration, use volatile-lru to evict only keys with expiration set.

How often should I monitor Valkey memory metrics?

Poll INFO MEMORY every 10 to 60 seconds depending on your traffic volume. Higher frequency helps catch memory spikes faster but increases monitoring overhead.

×
×