The Problem With Simple Hashing
You have 4 cache servers. To decide which server stores a key, you compute hash(key) % 4. Server 0, 1, 2, or 3. Simple, fast, works great — until you add a 5th server. Now it is hash(key) % 5. For most keys, the server assignment changes. 80% of your cache is invalidated instantly. Every key needs to be redistributed.
For a CDN with 100 servers and billions of cached objects, adding one server means re-fetching most content from the origin. This is a thundering herd. For a distributed database, it means migrating terabytes of data. Consistent hashing solves this: when a node is added or removed, only 1/N of the keys move.
Akamai's founders (including Karger, who invented consistent hashing) built the algorithm specifically for their CDN. When a server joins or leaves, only the keys that map to that server's segment of the ring need to move — everything else stays put.
The Hash Ring
Imagine a circle (ring) from 0 to 2^32. Hash each server's name to a position on the ring. Hash each key to a position on the ring. To find which server owns a key, walk clockwise from the key's position until you hit a server. That server owns the key.
When a new server joins, it takes a position on the ring and only claims the keys between it and the previous server. When a server leaves, its keys shift clockwise to the next server. Only K/N keys move (K = total keys, N = number of servers).
Figure 1: Keys and nodes are hashed onto the same ring. Each key belongs to the nearest node clockwise. Adding/removing a node only affects keys in its segment.
Virtual Nodes: Fixing the Balance
With 4 physical nodes, the ring segments are often unequal — one node might own 40% of the keys while another owns 10%. Virtual nodes fix this: each physical node gets 100-200 positions on the ring. The key space divides into many small segments distributed evenly. DynamoDB uses virtual nodes (called "vnodes"). Cassandra defaults to 256 virtual nodes per physical node.
Virtual nodes also make rebalancing smoother. When a new physical node joins, it takes virtual nodes from every existing node (small chunks from many nodes) instead of one big chunk from one node.
Consistent Hashing in the Wild
- DynamoDB: partition key hashed onto a ring. Each partition owns a range. Adding storage nodes moves partitions, not data reshuffles.
- Cassandra: token ring with Murmur3 hash. Each node owns token ranges. Adding a node splits ranges.
- Memcached: client-side consistent hashing (ketama algorithm). Adding a cache server invalidates only 1/N of keys.
- Nginx:
hash $request_uri consistentfor upstream load balancing. Adding a backend only redirects matching URLs. - Discord: consistent hashing for routing messages to the correct guild process across thousands of servers.
When Discord adds new servers during traffic spikes, consistent hashing ensures that most guild processes stay on their current server. Only the guilds that hash to the new server's ring segment are migrated — minimizing disruption during scaling events.
Rendezvous Hashing: The Alternative
Rendezvous hashing (highest random weight) is simpler: for each key, compute a weight for every server using hash(key, server). Pick the server with the highest weight. When a server is removed, each key picks the next-highest server — only that server's keys move. No ring needed. Used by Microsoft's Carp proxy and some CDNs. The downside: O(N) computation per lookup (must hash against every server), vs. O(log N) for ring-based consistent hashing with binary search.