How Did Justin Bieber Break Instagram?

The Post That Melted Instagram

It is 2014. Justin Bieber has 200 million followers on Instagram. He posts a photo. Within seconds, millions of users simultaneously request the same photo, the same comments, the same like count. The cache entry for that post does not exist yet. Every single request hits the database.

This is the thundering herd problem — also called a cache stampede. A single popular cache key expires or gets invalidated, and thousands (or millions) of requests simultaneously discover the cache miss. All of them race to regenerate the value by hitting the database. The database, which was happily serving 100 queries/second behind the cache, suddenly eats 100,000 queries in a second. It falls over. And now every request fails, not just the ones for that one key.

Instagram's early engineering team reported that celebrity posts could generate 10x the normal database load within seconds. The thundering herd was their most feared failure mode.

Why Does Cache Expiry Cause Coordinated Failure?

Caches use TTLs (time-to-live) to keep data fresh. When a TTL expires, the next request sees a cache miss and regenerates the value. For most keys, this works fine — maybe 1-2 requests arrive during the regeneration window.

But for hot keys — celebrity profiles, trending topics, viral posts — thousands of requests arrive every second. If the cache entry disappears for even 100 milliseconds, hundreds of requests will see the miss simultaneously and all try to regenerate. This is the stampede.

Figure 1: When a hot cache key expires, thousands of requests simultaneously discover the miss and flood the database. The database, designed to handle a fraction of that load, becomes overloaded and fails.

Solution 1: Lock-Based Cache Refresh

The simplest fix: when a cache miss occurs, only one request acquires a lock and regenerates the value. All other requests wait for the lock to be released, then read the freshly cached value. In Redis, this is implemented with SET key lock_value NX EX 5 — set only if not exists, with a 5-second expiry.

value = cache.get("bieber_post:123")
if value is None:
    if cache.set("lock:bieber_post:123", "1", nx=True, ex=5):
        value = db.query("SELECT * FROM posts WHERE id = 123")
        cache.set("bieber_post:123", value, ex=300)
        cache.delete("lock:bieber_post:123")
    else:
        time.sleep(0.05)  # wait and retry
        value = cache.get("bieber_post:123")

This works but has issues: the waiting requests consume connections and threads. If the lock holder crashes, you need the TTL on the lock to eventually release it. And the retry loop adds latency.

Solution 2: Probabilistic Early Expiration

Instead of letting the cache key expire and causing a stampede, regenerate it before it expires. Each request that hits the cache checks: "Am I close to the TTL? If so, should I be the one to refresh it?" The probability increases as the TTL approaches zero.

# XFetch algorithm
def should_recompute(ttl, delta, beta=1.0):
    return time.now() - delta * beta * log(random()) >= expiry

This is the XFetch algorithm. The parameter beta controls how early regeneration starts. With beta=1.0, roughly one request will trigger regeneration slightly before expiry, keeping the cache warm. No locks needed. No waiting. The hot key never actually expires.

Solution 3: Request Coalescing

At the load balancer or CDN layer, collapse duplicate requests into a single upstream request. If 1,000 requests for the same URL arrive within 50ms, the proxy sends one request to the backend and serves the response to all 1,000 clients. Nginx calls this proxy_cache_lock. Varnish calls it "request coalescing" or "grace mode."

Solution 4: Stale-While-Revalidate

Serve the stale cached value while a background process refreshes it. The user sees slightly old data for a moment, but there is zero latency spike and zero stampede. This is the stale-while-revalidate pattern, standardized in HTTP via Cache-Control headers. CDNs like Cloudflare and Fastly support it natively.

How Facebook Solved It: Memcache Leases

Facebook's Memcache deployment handles billions of requests per second. Their solution to the thundering herd is leases. When a cache miss occurs, Memcache issues a lease token to the first requesting client. Other clients that request the same key are told to wait (they get a "hot miss" response). Only the lease holder can write the new value. If the lease holder crashes, the lease expires after a few seconds and another client gets a new lease.

Facebook's memcache paper reports that leases reduced the rate of peak database queries from hot keys by 94%, turning thundering herds into orderly single-file lines.

The key insight: the cache layer itself should coordinate regeneration, not the application code. By pushing this logic into the caching infrastructure, every service benefits without each team reimplementing lock-based cache refresh.

The Hot Key Problem in Distributed Caches

Even without stampedes, hot keys cause problems in distributed caches. In a Redis Cluster, all requests for bieber_post:123 go to one node. That node becomes a bottleneck while others sit idle. Solutions:

Key replication: Store the hot key on multiple nodes with a random suffix. Read from a random replica. Spreads load evenly.
Client-side caching: Cache the hot value in the application process itself. Redis 6.0 added server-assisted client caching with invalidation notifications.
Local caching with short TTL: Keep a 1-second in-process cache for extremely hot keys. Accept 1 second of staleness to avoid hammering the cache cluster.

The Bottom Line

The thundering herd is not a theoretical edge case. If you have hot keys and TTL-based cache expiry, you will hit it. The fix depends on your scale: mutex locks work for small systems, probabilistic early expiration works for medium scale, and infrastructure-level solutions like Facebook's leases or CDN request coalescing are needed at massive scale.