Why Can't Your Database Have It All?

Why Can't Your Database Have It All?

The Impossible Triangle Every Engineer Hits

You are building a distributed database. You want every read to return the latest write (consistency). You want every request to get a response, even if some nodes are down (availability). And you want the system to keep working when network links between nodes fail (partition tolerance). Sounds reasonable, right?

In 2000, Eric Brewer stood at a podium at the ACM Symposium on Principles of Distributed Computing and dropped a conjecture: you can only have two of those three. Two years later, Seth Gilbert and Nancy Lynch proved it formally. The CAP theorem was born, and it has been confusing engineers ever since.

The confusion comes from thinking CAP is a menu where you pick your favorite two options. It is not. The real insight is that partition tolerance is not optional. Networks fail. Switches die. Cables get cut. Data center links drop. You do not get to choose whether partitions happen. The only real choice is: when a partition occurs, do you sacrifice consistency or availability?

Google's Spanner team reported that in their global network, they observe network partitions roughly once every few weeks. At scale, partition tolerance is not a feature — it is a requirement.

What Do C, A, and P Actually Mean?

The terms are deceptively simple, and most explanations get them subtly wrong. Let us be precise:

  • Consistency (linearizability): Every read receives the most recent write or an error. Not "eventually consistent." Not "read-your-writes." Full linearizability — the system behaves as if there is a single copy of the data.
  • Availability: Every request to a non-failing node receives a response — no timeouts, no errors. The response does not have to be the most recent write, but you do get something back.
  • Partition tolerance: The system continues operating despite arbitrary message loss or delay between nodes. A partition means node A cannot talk to node B, but both are still running.
The CAP Theorem: Pick Two (But You Already Picked P) C Consistency A Availability P Partition Tolerance CP Systems Spanner, HBase MongoDB, etcd AP Systems Cassandra, DynamoDB CouchDB, Riak CA = single node

Figure 1: The CAP triangle. In practice, partition tolerance is mandatory in any distributed system, so the real choice is between CP (sacrifice availability during partitions) and AP (sacrifice consistency during partitions). CA systems only exist as single-node databases.

What Happens When the Network Splits?

Imagine a two-node database. Node A is in US-East, Node B is in EU-West. A fiber cut between data centers means they cannot communicate. A client in Europe writes UPDATE users SET balance = 500 WHERE id = 42 to Node B. Simultaneously, a client in the US reads user 42's balance from Node A.

Now you have a choice. A CP system says: "Node A cannot confirm that its data is up to date, so it refuses to serve the read. It returns an error or times out." The system is consistent (no stale reads) but unavailable (Node A rejected a request from a working client to a working node). A AP system says: "Node A serves whatever data it has. The read returns the old balance." The system is available (every node responds) but inconsistent (the US client sees stale data).

During a Network Partition Node A (US-East) balance = 200 (stale data) Node B (EU-West) balance = 500 (latest write) PARTITION CP: Refuse the read "I can't confirm this data is current." Error 503 — consistent but unavailable AP: Serve stale data "Here's balance = 200. It might be old." 200 OK — available but inconsistent

Figure 2: When nodes A and B cannot communicate, a CP system refuses to serve potentially stale data (returns an error), while an AP system serves whatever it has (returns stale data). Neither is "wrong" — it depends on your application's requirements.

How Does Google Spanner Cheat the CAP Theorem?

Google Spanner is often described as a "CA system" — both consistent and available. How? Did Google break the laws of distributed computing? Not exactly. Spanner is technically a CP system that makes partitions so rare they almost never have to sacrifice availability.

The secret is TrueTime. Google built a time API backed by GPS receivers and atomic clocks in every data center. TrueTime does not return a single timestamp. It returns a time interval: [earliest, latest]. Spanner waits out the uncertainty interval before committing a transaction, which guarantees that transactions are globally ordered by real time. The commit-wait time is typically 7 milliseconds or less.

Because Spanner's network infrastructure is heavily redundant (multiple paths between data centers, rapid failover), partitions are extremely short-lived. When a partition does happen, Spanner chooses consistency — affected reads and writes will block or fail. But since partitions last milliseconds instead of minutes, the availability impact is negligible.

Spanner's TrueTime API has an uncertainty bound of about 7ms. The system waits out this interval before committing, ensuring global timestamp ordering. This effectively trades a tiny amount of latency for linearizability across continents.

DynamoDB: Betting on Availability

Amazon DynamoDB makes the opposite bet. The original Dynamo paper (2007) laid out a philosophy: for a shopping cart, it is better to accept a write and reconcile later than to reject a write and lose a sale. AP all the way.

DynamoDB uses eventual consistency by default. When you write to a DynamoDB table, the write goes to a primary replica and is acknowledged immediately. Replication to other replicas happens asynchronously. A read might hit a replica that has not received the latest write yet. Eventually (usually within milliseconds), all replicas converge to the same state.

DynamoDB also offers strongly consistent reads as an option. When you request one, DynamoDB routes the read to the leader replica that holds the most recent write. This gives you CP behavior for that specific read, at the cost of higher latency and lower throughput. You get to choose per-request, which is more nuanced than the CAP theorem's binary framing suggests.

The Consistency Spectrum Strong (CP) Eventual (AP) Spanner Linearizable CockroachDB Serializable MongoDB Majority writes DynamoDB Tunable per-read Cassandra Tunable quorum Riak Eventually consistent Most modern databases offer tunable consistency — you do not pick once forever. DynamoDB lets you choose strong or eventual per read. Cassandra lets you set quorum levels.

Figure 3: Real databases exist on a spectrum rather than in a binary CP/AP bucket. Many modern systems let you tune consistency per operation, making the CAP tradeoff more granular than the original theorem suggests.

PACELC: The Extension That Actually Matters

In 2012, Daniel Abadi proposed the PACELC theorem as a more useful framework. The idea: if there is a Partition, do you choose Availability or Consistency? Else (when the system is running normally, no partition), do you choose Latency or Consistency?

This captures something CAP misses entirely. Even when your network is healthy, there is a tradeoff between consistency and latency. A system that waits for all replicas to acknowledge a write before responding is consistent but slow. A system that acknowledges after writing to one replica is fast but may serve stale reads.

  • DynamoDB: PA/EL — during partitions, chooses availability. During normal operation, chooses low latency (eventual consistency by default).
  • Spanner: PC/EC — during partitions, chooses consistency. During normal operation, still chooses consistency (at the cost of higher latency from TrueTime commit-wait).
  • Cassandra: PA/EL — favors availability and low latency, but you can configure quorum reads/writes to get PC/EC behavior per query.
  • MongoDB: PA/EC — with majority write concern and read concern, it favors consistency even during normal operation.

When Do You Pick CP vs AP?

The answer depends on your domain. Pick CP when incorrect data is worse than no data. Financial transactions, inventory counts, user authentication, leader election — these need consistency. If your bank shows the wrong balance, that is worse than showing a temporary error page.

Pick AP when stale data is acceptable and downtime is not. Social media feeds, product catalogs, recommendation engines, analytics dashboards — these can tolerate slightly stale data. If you see a tweet two seconds late, nobody gets hurt. If Twitter goes completely down during a major event, that is a disaster.

The most pragmatic approach: use different consistency levels for different operations within the same system. Your checkout flow can use strong consistency while your product listings use eventual consistency. DynamoDB, Cassandra, and MongoDB all support this pattern.

Amazon's Werner Vogels put it best: "The reality is that for many services, the availability and latency requirements are such that they simply cannot afford the cost of strong consistency."

The Bottom Line

CAP is not a buffet where you pick your two favorites. Partitions are a fact of life in distributed systems. The real question is always: when (not if) a partition occurs, does your application need to stay consistent or stay available? And PACELC adds the second question: during normal operation, do you prioritize low latency or strong consistency?

The teams that build resilient systems understand that this is not a one-time architectural decision. It is a per-operation, per-use-case tradeoff that should be made deliberately, with full understanding of the consequences.

References and Further Reading