Event-Driven Inventory Management for Stadium POS: Consistency Across Partitioned Nodes
Event-Driven Inventory Management for Stadium POS: Consistency Across Partitioned Nodes
During the third quarter of a playoff game, our monitoring lit up with 47 lanes reporting “inventory sync delayed > 5 min.” Network logs showed a fiber cut between the main data closet and the east concourse. Those 47 lanes kept selling—hard—but the central stock counters were lying. When the link came back 14 minutes later, we had oversold three popular IPA taps by 112 units across 18 stands. The GM’s exact words over radio were not suitable for this post. That incident forced us to throw away the old centralized-inventory-with-locking model entirely.
Why Classical Locking Dies in Stadiums
Traditional inventory (SELECT FOR UPDATE → decrement → COMMIT) assumes you can reach the database quickly and reliably. In a 70,000-seat venue you get:
- 60–120 ms baseline latency even when everything is happy
- 800–2,000 ms spikes during rushes
- Multi-minute partitions affecting whole sections
- Hundreds of concurrent modifiers (lanes, kitchen displays, restock tablets)
Lock contention + network partitions = either throughput collapse or phantom negative stock. Neither is acceptable when a line of 200 people is waiting and the stand has physically run out.
Moving to Event-Sourced + Local Soft Reserves
We rebuilt around three core ideas:
- Every stock change is an immutable event (Sale, Restock, Void, ManualAdjust)
- Each lane maintains its own local projected inventory (SQLite)
- Global truth is a stream of events processed into materialized views
- During partitions lanes use local projection + advisory low-stock thresholds
Event schema (simplified):
interface StockEvent {
eventId: string; // UUID v7
aggregateId: string; // "product:uuid" or "tap:uuid"
type: "Sale" | "Restock" | "Void" | "Adjust";
quantity: number; // positive for restock/adjust-in, negative for sale/void-out
laneId: string;
timestamp: string;
idempotencyKey: string;
causalityToken?: string; // for composite adjustments
}
Lanes append events locally first, update their projection immediately (optimistic), and queue for upstream delivery.
Conflict-Free Reconciliation with Last-Write-Wins + Manual Overrides
Because strict total-ordering across partitions is impossible without freezing sales, we use:
- Per-product last-write-wins on restock/adjust events (timestamp + lane priority for ties)
- Sale events are never reordered or dropped; they always subtract
- Voids only succeed if the original sale event is still visible in the stream
This means during long partitions you can temporarily go negative globally while local views stay non-negative. When partitions heal:
- Upstream replays all queued events in arrival order
- Materialized view recomputes current stock
- If negative stock is detected, an “oversold” compensation event is emitted (usually manual credit to venue)
In practice oversold events happen 2–5 times per large event—mostly on very popular limited items—and are far cheaper than frozen tills.
Real Incident: The Nacho Cheese Partition
Midway through a triple-overtime basketball game, the nacho cheese dispenser stand (high-margin hero item) lost connectivity for 19 minutes. Local projection allowed 240 more sales than physically possible. When the link returned:
- 187 events queued and delivered
- Global view went -47 units
- System auto-flagged the product as critically oversold
- Concession manager comped 47 orders via tablet in < 4 minutes
- Revenue protected; customer impact contained to “free upgrade” level
Had we blocked sales at zero locally, the stand would have looked closed for 19 minutes during OT. The GM later said he’d take the comps over the optics of a dark stand any day.
Observability We Wish We’d Built Sooner
- Per-product event lag histogram (how far behind is this aggregate?)
- Partition duration per-lane + per-section
- Oversold event rate + magnitude dashboard
- “Last seen upstream” timestamp per lane
Without these we were flying blind on whether a negative was 8 units old or 80.
Hard-Won Rules
- Never let central consistency block the point of sale.
- Prefer visible oversell + fast human compensation over invisible stock-lies + surprise end-of-night reconciliation bombs.
- Timestamp-based last-write-wins on supply events works surprisingly well when sales are monotonic subtracts.
- Keep local projections simple and fast—complex CRDTs sounded sexy but added latency we couldn’t afford on low-power handhelds.
The system isn’t perfectly consistent. It’s predictably inconsistent in ways that match the physics of concrete, steel, and 60,000 thirsty people. And that turns out to be good enough for revenue protection and customer experience.