Mar 29, 2026

7 min read

Cache Handbook: 6 Classic “Traps” When Using Cache That Even Seniors Fall Into

This article is compiled and adapted from the book “A Cache Handbook for Software Engineers” by Quang Hoang (Software Engineer at Google). This is part 3/4 of the series.

After mastering the fundamentals and the consistency problem, this article will help you identify the 6 most common “traps” when operating a Cache system — and how to avoid them.

1. Cache Avalanche — Mass “Snowslide”

Definition: Cache Avalanche occurs when a large number of keys in the Cache expire or are deleted at the same time. A flood of requests experience Cache Miss and rush to the Database, overwhelming it.

Causes:

Identical TTLs: Many keys are set with the same expiration time -> they all expire simultaneously.
Cache server crash/restart: All data is lost.
Traffic spike right when keys expire (flash sales, major events).

Solutions:

Randomize TTLs: Instead of a fixed 60-minute TTL for everything, add a random offset (1-5 minutes). Keys will expire at staggered intervals, reducing sudden load on the DB.
Build a High Availability Cache cluster: Ensure the Cache server is not a single point of failure.
Cache Pre-warming: Instead of waiting for users to access data before loading it, run a background script to pre-load important data into Cache before opening the floodgates to traffic. Especially effective for Cold Start and Flash Sale scenarios.

2. Thundering Herd (Cache Stampede) — The “Stampeding Herd”

Definition: Similar to Cache Avalanche, but occurs with only a single Hot Key (e.g., “Lottery results at 6:30 PM”). When the Hot Key expires, a massive number of requests flood the DB.

The solution depends on the type of Cache:

2.1. Solutions for Inline Cache

Stale Data Serving: When a Hot Key expires, Cache returns stale data to the user. Meanwhile, a background thread fetches fresh data from DB and updates the Cache.

This solution is used in Nginx (proxy_cache_use_stale + proxy_cache_background_update) and the Caffeine library (Java) via the refreshAfterWrite feature:


LoadingCache<Key, Graph> cache = Caffeine.newBuilder()
    .maximumSize(10_000)
    .expireAfterWrite(Duration.ofMinutes(5))
    .refreshAfterWrite(Duration.ofMinutes(1))
    .build(key -> queryDB(key));

Drawback: Users see stale data for a brief period. This method is only effective for keys that expire due to TTL, and does not handle cases where keys are actively evicted or during Cold Start.

Request Coalescing: When 10,000 requests all read the same Hot Key that just expired, the Cache Server groups them all into a queue and sends a single representative request to the DB. When the DB responds, the result is distributed to all 10,000 waiting requests.

This technique is widely used in Nginx (proxy_cache_lock) and CDNs. The implementation is fairly simple: when receiving a read request for key A, the API server acquires a local mutex lock for key A:

Lock acquired successfully -> send request to Cache/DB, wait for response, unlock.
Lock not acquired -> wait until a response for key A is available.

See the implementation in Go’s singleflight library for reference.

2.2. Solutions for Cache-Aside

Locking (Distributed Mutex): When a Hot Key is missed, each request must attempt to acquire a distributed mutex lock (e.g., using SETNX in Redis). Only the request that acquires the lock is allowed to query the DB. Other requests sleep and retry reading from Cache.

Probabilistic Early Expiration (P.E.E): Instead of waiting for a key to actually expire (TTL = 0), each time Cache is read, the system runs a probability algorithm to decide: “Should we pretend the key has already expired and proactively fetch fresh data from DB?”

The algorithm is based on the X-Fetch formula. A key will be refreshed early if:


T_fetch + beta x gap x (-log(rand())) > T_expiry

Where:

T_fetch: Current time.
T_expiry: Actual key expiration time.
gap: Average time to query DB.
beta: Adjustment factor (default = 1). beta > 1 -> expires more frequently.
rand(): Random number from 0 to 1.

How it works:

Cache is fresh: T_expiry is far away -> always Cache Hit.
Cache is about to expire: The gap narrows -> probability of “volunteering” to update increases.

Using -log(rand()) is extremely clever: it ensures that among thousands of requests, the chance of at least one request proactively updating is very high, but the chance of all of them going at once is extremely low.


def get_data_with_pee(key):
    cached_item = cache.get(key)
 
    # Cache Miss
    if cached_item is None:
        data = fetch_from_db(key)
        cache.set(key, data, TTL)
        return data
 
    # Cache Hit - check X-Fetch probability
    T_fetch = current_time()
    T_expiry = cached_item.expiry_time
    gap = get_average_db_fetch_time()
    beta = 1.0
    rand_val = random_between(0.0001, 1.0)
 
    probabilistic_offset = beta * gap * (-log(rand_val))
 
    if (T_fetch + probabilistic_offset) > T_expiry:
        # Early expiration triggered
        new_data = fetch_from_db(key)
        cache.set(key, new_data, TTL)
        return new_data
    else:
        return cached_item.data

Advantages: Simple, low latency, only activates when requests come in (Cold Keys expire normally).

Drawbacks: Difficult to configure gap if DB performance fluctuates. If data rarely changes, early recomputation wastes resources.

3. Cache Penetration — “Piercing Through” Cache

Definition: Cache Penetration occurs when users (or hackers) continuously query data that doesn’t exist in Cache or Database. Example: a hacker sends millions of requests for user_id = -1 or random UUIDs -> DB gets DDoS’d.

3.1. Cache Null Values

If data doesn’t exist in DB, still store a null value in Cache with a short TTL:


NORMAL_TTL = 3600    # Normal TTL for valid data
SHORT_TTL = 60       # Short TTL for null values
NULL_MARKER = "EMPTY_DATA"
 
def get_data_with_null_caching(key):
    cached_value = cache.get(key)
 
    if cached_value is not None:   # Cache Hit
        if cached_value == NULL_MARKER:
            return None
        return cached_value
 
    db_data = fetch_from_db(key)
 
    if db_data is None:            # Doesn't exist in DB
        cache.set(key, NULL_MARKER, SHORT_TTL)
        return None
 
    cache.set(key, db_data, NORMAL_TTL)
    return db_data

3.2. Bloom Filter

Bloom Filter is a probabilistic data structure that is extremely memory-efficient, answering in terms of:

“Definitely not present”
“Possibly present”

We use a Bloom Filter to quickly check whether a key might exist before allowing a Cache/DB query. If the Bloom Filter says “No” -> block immediately.

Drawbacks:

Pure Bloom Filters do not support deletion. You need to use Counting Bloom Filter or Cuckoo Filter for deletion support.
False Positive rate increases as the bit array fills up.
Complicates the flow: When adding a new entry to DB, you must simultaneously add it to the Bloom Filter.

4. Cache Thrashing — The “Evict Then Reload” Loop

Definition: Data is continuously written to Cache, forcing old data to be evicted, but the evicted data is immediately requested again and must be reloaded from DB.

Symptoms: Hit Rate drops sharply, Eviction Rate spikes, CPU/Disk IO on DB skyrockets.

Causes:

Cache capacity too small: Hot data size exceeds Cache capacity.
Unsuitable Eviction Policy: Default LRU is not always optimal.
Resource Contention: Multiple applications/threads sharing a small Cache memory pool, continuously evicting each other’s data.

Solutions:

Increase Cache size.
Change the Eviction Policy (e.g., from LRU to LFU).
Review application logic: there may be a logic flow continuously scanning all data in the DB, pushing hot data out of Cache.

5. Memory Fragmentation

Definition: RAM becomes divided into many scattered free regions during continuous allocation and deallocation of data. Although the total free space may still be substantial, the system cannot find a contiguous memory region large enough -> wasted resources or OOM.

Two types of fragmentation:

External Fragmentation: Data is continuously created, updated, and randomly deleted. Free memory regions get chopped up and interspersed. For example in Redis: jemalloc cannot fully return small scattered memory pages to the OS -> the OS sees Redis “consuming” a lot of RAM but in reality only a portion is used.

Internal Fragmentation: The OS allocates in fixed-size blocks. For example: Memcached divides memory into Slab Classes (96 bytes, 120 bytes…). Storing a 97-byte value -> goes into the 120-byte class -> wastes 23 bytes (~20%).

Solutions for Redis:

Enable Active Defragmentation (Redis 4.0+): CONFIG SET activedefrag yes
Restart: Save data to disk (RDB/AOF) then restart.
Separate servers: If data sizes vary greatly (sessions of a few dozen bytes vs. cached HTML pages of several MB), split them into 2 separate Cache servers.

6. Connection Churn — Connection “Spinning”

Definition: Clients continuously open new TCP connections to the Cache server, send a few requests, then close the connection, repeating at high frequency.

Consequences:

CPU/RAM Overhead: TCP requires a 3-way handshake to open and a 4-way teardown to close. Each connection occupies a file descriptor. CPU spends most of its time on system calls instead of GET/SET operations.
Port Exhaustion: When a connection is closed, the OS puts the port into TIME_WAIT state lasting 60s. Opening/closing thousands of connections per second -> port depletion.

Solutions:

Connection Pooling on the App Server to maintain a stable number of connections.
Proxy: Place a proxy between App Server and Cache Server (Twemproxy, Envoy). The proxy is responsible for maintaining stable TCP connections.