Race condition
Description
Two parallel paths interact through a shared resource without coordinating their order, so the observable outcome depends on relative timing rather than program logic. The classic example: thread A reads a counter, computes counter+1, writes it back; thread B does the same. If both reads happen before either write, one increment is lost. The bug is not in either path’s code; it’s in the assumption that the two could be sequenced safely without explicit coordination. The structural shape — two paths + shared container — is the right diagnostic. Many concurrency bugs that look like “weird timing” are races; a non-trivial fraction of distributed-system bugs are races writ large (multi-node, eventually-consistent state). The fix surface is well-developed: synchronization primitives (mutexes, atomics), ordering protocols (consensus, vector clocks), or structural prevention (immutability, single-owner state).Triggers
User-initiated: User describes “weird timing bugs,” “flaky tests,” “only reproduces under load,” “race condition.” Vocabulary cues: “race,” “concurrent,” “lock,” “atomicity,” “deadlock,” “thread safety,” “interleaving.” Agent-initiated: Agent notices that two paths touch the same state without explicit ordering, OR that observed behavior depends on which of two operations completes first. Candidate inference: “what’s the shared container; what ordering assumption is implicit?” Situation-shape signals: Bugs that don’t reproduce deterministically. “Works in dev, fails in prod.” Test failures with hand-wavy “rerun and they pass.” Performance increases that cause new bugs to surface.Exclusions
- Single-threaded systems — no concurrency, no race. The concept doesn’t fire even if code has shared mutable state.
- Commutative operations on shared state — if both paths perform operations whose outcome doesn’t depend on order (e.g., set union, max), the race exists but is not observable as a bug.
- Eventually-consistent semantics by design — some systems explicitly accept races as part of their consistency model (CRDTs, gossip protocols); naming “race” then is a label for behavior the design chose, not a bug.
- Sequential causality — operations that have happened-before relationships established by the program don’t race; the ordering is enforced by causation, not just timing.
Structure
Relationships
- seam — races live at concurrency-context seams; the boundary between assumed-atomic and actually-atomic.
- container — the shared container is one of race-condition’s constitutive primitives.
- make-wrong-unrepresentable — the strongest race fix is structural absence of the dangerous interleaving (immutable data, single-writer rules, transactional boundaries).
- asymmetric-gate — synchronization primitives are asymmetric gates: cheap to acquire-then-release in the common case, expensive when contention is high.
- load-bearing — diagnostic: is the race load-bearing (must be fixed) or benign (rare + low-stakes)?
Examples
Lost-update counters · computer-science
Lost-update counters · computer-science
Check-then-act (TOCTOU) · computer-science
Check-then-act (TOCTOU) · computer-science
Async UI state-update conflicts · computer-science
Async UI state-update conflicts · computer-science
Cache invalidation races · computer-science
Cache invalidation races · computer-science
Coffman, Elphick, Shoshani (1971), "System Deadlocks" — foundational ordering conditions. · computer-science
Coffman, Elphick, Shoshani (1971), "System Deadlocks" — foundational ordering conditions. · computer-science
Deadlocks · computer-science
Deadlocks · computer-science
Lamport (1978), "Time, Clocks, and the Ordering of Events in a Distributed System" — distributed-races at scale. · computer-science
Lamport (1978), "Time, Clocks, and the Ordering of Events in a Distributed System" — distributed-races at scale. · computer-science
Coffman, Elphick, & Shoshani (1971), "System Deadlocks," ACM Computing Surveys — the four deadlock conditions; Lamport (1978), "Time, Clocks, and the Ordering of Events in a Distributed System," Communications of the ACM 21(7) — happens-before and the lift to distributed settings. · computer-science
Coffman, Elphick, & Shoshani (1971), "System Deadlocks," ACM Computing Surveys — the four deadlock conditions; Lamport (1978), "Time, Clocks, and the Ordering of Events in a Distributed System," Communications of the ACM 21(7) — happens-before and the lift to distributed settings. · computer-science
Standard operating-systems textbooks — Silberschatz, Galvin & Gagne, *Operating System Concepts*; Tanenbaum & Bos, *Modern Operating Systems* — canonical treatment of race conditions and the critical-section problem. · computer-science
Standard operating-systems textbooks — Silberschatz, Galvin & Gagne, *Operating System Concepts*; Tanenbaum & Bos, *Modern Operating Systems* — canonical treatment of race conditions and the critical-section problem. · computer-science
counter variable whose machine-level load/increment/store steps can interleave between processes so that an increment is silently lost. Tanenbaum’s Modern Operating Systems gives the same shape through the print-spooler example: two processes each read the next free slot in the spooler directory, both see the same slot, and both write — so one job’s filename overwrites the other’s. In both texts the diagnosis is identical: the failure is not in either process’s code but in the unguarded interleaving of two paths through shared state.The textbooks then promote the diagnosis into the critical-section problem: the segment of each process that touches the shared resource is its critical section, and a correct solution must enforce mutual exclusion (only one process in its critical section at a time), progress, and bounded waiting. This is the standard pedagogical move from “here is a timing bug” to “here is the structural property that prevents the whole class of timing bugs.”Inference: The two-paths-plus-shared-container shape is exactly the textbook framing, and the textbook fix surface is the one to reach for: identify the critical section (the span where both paths touch the contended resource), then enforce mutual exclusion over it — via mutex, semaphore, monitor, or, better, by restructuring so the contended state has a single owner and the dangerous interleaving cannot be represented at all.Stale-read in distributed systems · computer-science
Stale-read in distributed systems · computer-science