How Do Microservices Handle Transactions?

How Do Microservices Handle Transactions?

The Problem: No Distributed ACID

In a monolith, booking a trip is one database transaction: charge the card, reserve the hotel, book the flight. If anything fails, rollback. ACID guarantees.

In microservices, each step is a different service with its own database. You cannot wrap a transaction around three different databases owned by three different teams. Two-phase commit (2PC) exists but it is slow, blocking, and a single coordinator failure locks everything. At scale, nobody uses 2PC across microservices.

Enter the Saga pattern: a sequence of local transactions where each step has a compensating action that undoes it if a later step fails. Instead of rollback, you run compensations backward.

Uber's trip lifecycle is a saga: match driver → start trip → calculate fare → charge rider → pay driver. If charging the rider fails, the saga compensates by reversing the fare calculation and notifying the driver.

Two Flavors: Choreography vs. Orchestration

Choreography: each service listens for events and decides what to do next. No central coordinator. Order Service publishes "OrderCreated," Payment Service hears it and charges the card, publishes "PaymentCompleted," Shipping Service hears that and ships. Simple for 3-4 steps. Becomes spaghetti at 10+ steps because the flow is spread across every service.

Orchestration: a central saga orchestrator tells each service what to do and handles the compensation logic. The orchestrator is a state machine: "step 1 succeeded → call step 2 → step 2 failed → compensate step 1." Easier to understand, debug, and monitor. This is what most production systems use.

Saga: Forward Steps + Compensations 1. Reserve Hotel 2. Book Flight 3. Charge Card 4. Confirm (FAILS!) Step 4 fails → run compensations backward: C3. Refund Card C2. Cancel Flight C1. Release Hotel Each forward step has a compensating action. Compensations undo completed steps in reverse order.

Figure 1: A travel booking saga. If step 4 fails, compensating actions run in reverse: refund the card, cancel the flight, release the hotel reservation.

The Hardest Part: Designing Compensations

Not everything is easily reversible. You can refund a payment, but you cannot un-send an email or un-ship a package. Compensations must be semantically correct, not necessarily "undo." A compensation for "shipped package" might be "create return label + notify customer," not magically retrieving the package.

Compensations must also be idempotent. Network failures mean a compensation might run twice. Refunding a card twice would be a disaster. Every compensation needs a unique idempotency key.

Saga Isolation: The Dirty Secret

Sagas do not provide isolation. Between step 1 and step 3, other transactions can see the intermediate state. A hotel room is "reserved" but the payment hasn't been charged yet. This means:

  • Dirty reads: other services see uncommitted data from in-progress sagas.
  • Lost updates: concurrent sagas might overwrite each other's changes.

Mitigation strategies: semantic locks (mark the hotel room as "pending"), commutative updates (counter increments instead of absolute sets), pessimistic views (read the "worst case" value), and reordering steps to put reversible operations first.

Tools and Frameworks

  • Temporal: workflow engine that makes saga orchestration trivial. Define steps as activities, compensations as cleanup logic. Temporal handles retries, timeouts, and state persistence. Used by Stripe, Netflix, Snap.
  • AWS Step Functions: serverless state machine. Define saga steps as Lambda functions with catch/compensation blocks.
  • Camunda: BPMN-based workflow engine with saga support for Java/Spring ecosystems.

Netflix's Conductor (now part of Orkes) orchestrates millions of saga workflows daily for content encoding, recommendation pipelines, and subscription management. Each workflow can have 50+ steps with full compensation support.

References and Further Reading