Back to posts
Mar 29, 2026
7 min read

Cache Handbook: 6 Classic “Traps” When Using Cache That Even Seniors Fall Into

This article is compiled and adapted from the book “A Cache Handbook for Software Engineers” by Quang Hoang (Software Engineer at Google). This is part 3/4 of the series.

After mastering the fundamentals and the consistency problem, this article will help you identify the 6 most common “traps” when operating a Cache system — and how to avoid them.


1. Cache Avalanche — Mass “Snowslide”

Definition: Cache Avalanche occurs when a large number of keys in the Cache expire or are deleted at the same time. A flood of requests experience Cache Miss and rush to the Database, overwhelming it.

Causes:

Solutions:

  1. Randomize TTLs: Instead of a fixed 60-minute TTL for everything, add a random offset (1-5 minutes). Keys will expire at staggered intervals, reducing sudden load on the DB.

  2. Build a High Availability Cache cluster: Ensure the Cache server is not a single point of failure.

  3. Cache Pre-warming: Instead of waiting for users to access data before loading it, run a background script to pre-load important data into Cache before opening the floodgates to traffic. Especially effective for Cold Start and Flash Sale scenarios.


2. Thundering Herd (Cache Stampede) — The “Stampeding Herd”

Definition: Similar to Cache Avalanche, but occurs with only a single Hot Key (e.g., “Lottery results at 6:30 PM”). When the Hot Key expires, a massive number of requests flood the DB.

The solution depends on the type of Cache:

2.1. Solutions for Inline Cache

Stale Data Serving: When a Hot Key expires, Cache returns stale data to the user. Meanwhile, a background thread fetches fresh data from DB and updates the Cache.

This solution is used in Nginx (proxy_cache_use_stale + proxy_cache_background_update) and the Caffeine library (Java) via the refreshAfterWrite feature:

LoadingCache<Key, Graph> cache = Caffeine.newBuilder() .maximumSize(10_000) .expireAfterWrite(Duration.ofMinutes(5)) .refreshAfterWrite(Duration.ofMinutes(1)) .build(key -> queryDB(key));

Drawback: Users see stale data for a brief period. This method is only effective for keys that expire due to TTL, and does not handle cases where keys are actively evicted or during Cold Start.

Request Coalescing: When 10,000 requests all read the same Hot Key that just expired, the Cache Server groups them all into a queue and sends a single representative request to the DB. When the DB responds, the result is distributed to all 10,000 waiting requests.

This technique is widely used in Nginx (proxy_cache_lock) and CDNs. The implementation is fairly simple: when receiving a read request for key A, the API server acquires a local mutex lock for key A:

See the implementation in Go’s singleflight library for reference.

2.2. Solutions for Cache-Aside

Locking (Distributed Mutex): When a Hot Key is missed, each request must attempt to acquire a distributed mutex lock (e.g., using SETNX in Redis). Only the request that acquires the lock is allowed to query the DB. Other requests sleep and retry reading from Cache.

Probabilistic Early Expiration (P.E.E): Instead of waiting for a key to actually expire (TTL = 0), each time Cache is read, the system runs a probability algorithm to decide: “Should we pretend the key has already expired and proactively fetch fresh data from DB?”

The algorithm is based on the X-Fetch formula. A key will be refreshed early if:

T_fetch + beta x gap x (-log(rand())) > T_expiry

Where:

How it works:

Using -log(rand()) is extremely clever: it ensures that among thousands of requests, the chance of at least one request proactively updating is very high, but the chance of all of them going at once is extremely low.

def get_data_with_pee(key): cached_item = cache.get(key) # Cache Miss if cached_item is None: data = fetch_from_db(key) cache.set(key, data, TTL) return data # Cache Hit - check X-Fetch probability T_fetch = current_time() T_expiry = cached_item.expiry_time gap = get_average_db_fetch_time() beta = 1.0 rand_val = random_between(0.0001, 1.0) probabilistic_offset = beta * gap * (-log(rand_val)) if (T_fetch + probabilistic_offset) > T_expiry: # Early expiration triggered new_data = fetch_from_db(key) cache.set(key, new_data, TTL) return new_data else: return cached_item.data

Advantages: Simple, low latency, only activates when requests come in (Cold Keys expire normally).

Drawbacks: Difficult to configure gap if DB performance fluctuates. If data rarely changes, early recomputation wastes resources.


3. Cache Penetration — “Piercing Through” Cache

Definition: Occurs when users (or hackers) continuously query data that doesn’t exist in Cache or Database. Example: a hacker sends millions of requests for user_id = -1 or random UUIDs -> DB gets DDoS’d.

3.1. Cache Null Values

If data doesn’t exist in DB, still store a null value in Cache with a short TTL:

NORMAL_TTL = 3600 # Normal TTL for valid data SHORT_TTL = 60 # Short TTL for null values NULL_MARKER = "EMPTY_DATA" def get_data_with_null_caching(key): cached_value = cache.get(key) if cached_value is not None: # Cache Hit if cached_value == NULL_MARKER: return None return cached_value db_data = fetch_from_db(key) if db_data is None: # Doesn't exist in DB cache.set(key, NULL_MARKER, SHORT_TTL) return None cache.set(key, db_data, NORMAL_TTL) return db_data

3.2. Bloom Filter

Bloom Filter is a probabilistic data structure that is extremely memory-efficient, answering in terms of:

We use a Bloom Filter to quickly check whether a key might exist before allowing a Cache/DB query. If the Bloom Filter says “No” -> block immediately.

Drawbacks:


4. Cache Thrashing — The “Evict Then Reload” Loop

Definition: Data is continuously written to Cache, forcing old data to be evicted, but the evicted data is immediately requested again and must be reloaded from DB.

Symptoms: Hit Rate drops sharply, Eviction Rate spikes, CPU/Disk IO on DB skyrockets.

Causes:

  1. Cache capacity too small: Hot data size exceeds Cache capacity.
  2. Unsuitable Eviction Policy: Default LRU is not always optimal.
  3. Resource Contention: Multiple applications/threads sharing a small Cache memory pool, continuously evicting each other’s data.

Solutions:

  1. Increase Cache size.
  2. Change the Eviction Policy (e.g., from LRU to LFU).
  3. Review application logic: there may be a logic flow continuously scanning all data in the DB, pushing hot data out of Cache.

5. Memory Fragmentation

Definition: RAM becomes divided into many scattered free regions during continuous allocation and deallocation of data. Although the total free space may still be substantial, the system cannot find a contiguous memory region large enough -> wasted resources or OOM.

Two types of fragmentation:

External Fragmentation: Data is continuously created, updated, and randomly deleted. Free memory regions get chopped up and interspersed. For example in Redis: jemalloc cannot fully return small scattered memory pages to the OS -> the OS sees Redis “consuming” a lot of RAM but in reality only a portion is used.

Internal Fragmentation: The OS allocates in fixed-size blocks. For example: Memcached divides memory into Slab Classes (96 bytes, 120 bytes…). Storing a 97-byte value -> goes into the 120-byte class -> wastes 23 bytes (~20%).

Solutions for Redis:


6. Connection Churn — Connection “Spinning”

Definition: Clients continuously open new TCP connections to the Cache server, send a few requests, then close the connection, repeating at high frequency.

Consequences:

Solutions:

  1. Connection Pooling on the App Server to maintain a stable number of connections.
  2. Proxy: Place a proxy between App Server and Cache Server (Twemproxy, Envoy). The proxy is responsible for maintaining stable TCP connections.

Series: Cache Handbook

  1. Core Foundations of Caching
  2. Decoding the Cache Consistency Problem
  3. 6 Classic “Traps” When Using Cache ← You are here
  4. From Monitoring to Scaling

Related