Skip to main content
computer-science engineering-and-technology law political-science sociology

Consensus

Description

Consensus is the structural primitive for “multiple nodes agree on a shared value despite the network being unreliable and some nodes possibly failing.” The protocol must satisfy three properties — agreement (all correct nodes decide the same value), validity (the decided value was proposed by some node), and termination (correct nodes eventually decide). FLP impossibility tells us that no purely-asynchronous protocol can satisfy all three with even one failure; production protocols (Paxos, Raft, PBFT) work around this by adding partial synchrony or randomization. The diagnostic shape: a value needs to be agreed-upon by a set of nodes, none of which has special authority by construction, in the presence of network failures or node failures. If a single node can authoritatively decide, the concept doesn’t fire (use a leader instead). If the nodes don’t need to converge on the same value, the concept doesn’t fire (use eventual consistency).

Triggers

User-initiated: User describes a system where multiple nodes must agree on a value (leader, transaction outcome, config), or expresses concern about split-brain. Vocabulary cues: “consensus,” “Paxos,” “Raft,” “quorum,” “leader election,” “split-brain,” “voting.” Agent-initiated: Engine notices a distributed system where shared mutable state crosses node boundaries with no obvious leader. Candidate inference: “this needs consensus — what’s the quorum rule, the failure model (crash vs Byzantine), and the liveness guarantee?” Situation-shape signals: Distributed state with no natural single source of truth; need for fault tolerance against node failures; correctness depends on all replicas converging on the same history.

Exclusions

  • Single-leader systems — a designated leader makes decisions; consensus only needed for leader election (a degenerate instance).
  • CRDT / eventual-consistency designs — convergence happens via merge functions, not agreement protocols; consensus isn’t needed.
  • Read-only or write-once data — no decisions to make; consensus is overkill.
  • Networks with strong synchrony assumptions and trusted nodes — the simpler “ask everyone, take any answer” works; consensus’s complexity isn’t earning its keep.

Structure

Internal structure of consensus: a table of its component slots and the concepts that fill them. = N participants + a proposal + a protocol that converges on agreement + a quorum rule. The protocol is the most varied piece — Paxos, Raft, PBFT, two-phase commit, Roberts Rules of Order, jury deliberation all instantiate consensus with different assumptions about failure modes (crash-stop vs Byzantine) and different cost profiles.

Relationships

Relationship neighborhood of consensus: a graph of the concepts it connects to and the concepts it is a part of.
  • replication — consensus is the substrate that lets multi-leader and leaderless replication converge.
  • quorumcatalog stub — quorum is the parameter; consensus is the protocol.
  • load-bearing — consensus is structurally load-bearing for any shared mutable state across nodes.
  • bookends — the propose / commit bookends frame each consensus instance.

Examples

Jury verdicts · law

12 jurors agree on a verdict; unanimity-quorum (criminal) or supermajority-quorum (some civil) defines binding agreement.

Blockchain · computer-science

Bitcoin’s Nakamoto consensus and Ethereum’s PoS are Byzantine-tolerant consensus protocols at planetary scale.
Commercial aircraft fly-by-wire systems (Airbus A320 family, Boeing 777/787, F-16) use triplex or quadruplex redundant flight-control computers that vote on control-surface commands at each control-loop tick. Three (or four) independent computers — typically running different hardware, different software written to the same specification, sometimes by different teams to reduce common-mode failure — each compute the appropriate control output for the current sensor inputs, and a voter circuit takes the majority value as the command actually sent to the actuators. When one computer’s output disagrees with the other two, the disagreeing computer is suspected of failure, flagged for ground inspection, and removed from the voting pool until verified.The structural pattern is a synchronous, low-latency consensus protocol with the agreement-validity-termination triple specialized to a hard-real-time domain. Termination is trivial (the vote happens every few milliseconds whether or not nodes agree); agreement is enforced by the majority-voting hardware; validity is the property that the agreed value is one of the proposed values. The system tolerates one Byzantine failure (in a triplex configuration) or two (in quadruplex), per the standard f < N/3 fault-tolerance bound. The reliability case for fly-by-wire rests on these properties holding — the catastrophic-failure rate for control-surface commands must be below ~10⁻⁹ per flight hour, which the triplex configuration achieves only because the consensus protocol prevents single-computer failures from propagating.Inference: The aviation case shows consensus operating under constraints software engineers rarely face — millisecond latency budgets, no possibility of retry, every cycle binding. The design choices it forces (synchronous voting, hardware voters, separate-source software diversity) clarify what consensus can and cannot do: it can handle isolated failures, but it cannot handle correlated failures (the structural argument for diverse software stacks). The Boeing 737 MAX MCAS incidents are a counterexample — single-sensor inputs without voting eliminated the consensus’s fault-tolerance property entirely.
Miguel Castro and Barbara Liskov’s 1999 OSDI paper Practical Byzantine Fault Tolerance (PBFT) was the first consensus protocol to achieve Byzantine fault tolerance with practical performance in production-style asynchronous network conditions. Earlier Byzantine-tolerant protocols (Lamport-Shostak-Pease 1982, the “Byzantine Generals” formulation) demonstrated that consensus was possible in the presence of arbitrarily-misbehaving nodes (not just crashes), but at costs (exponential message complexity, strong synchrony assumptions) that made them impractical for real systems. PBFT used a three-phase message protocol (pre-prepare / prepare / commit) that achieved O(N²) message complexity under normal operation and provided safety under fully-asynchronous network conditions while requiring partial synchrony only for liveness.The protocol tolerates up to f Byzantine failures with N = 3f+1 total nodes — the standard one-third lower bound from Lamport-Shostak-Pease. A primary node leads each consensus round; if the primary is suspected of misbehaving, a view-change protocol elects a new primary. The crucial property is safety: even when the network is fully partitioned and the primary is Byzantine, no two correct nodes will ever commit conflicting values. Liveness can be temporarily lost during partition; safety is unconditional.Inference: PBFT became the template for modern Byzantine-tolerant protocols, including Tendermint, Hyperledger Fabric, and the consensus engines underlying many blockchain platforms. Its lasting contribution is the engineering demonstration that the theoretical impossibility results (FLP) and the Byzantine-fault-tolerance bound (N ≥ 3f+1) can be navigated with carefully-designed protocols that exploit partial synchrony for progress while preserving safety unconditionally. The structural lesson is that “consensus is impossible” results characterize specific impossibilities, not blanket impossibility — practical protocols work around them by trading off some assumption (synchrony, fairness, crash-only failures) the impossibility-proof relied on.
etcd, Consul, ZooKeeper all use Raft or Paxos for replicated state.
The Fischer-Lynch-Paterson result (1985) — “Impossibility of Distributed Consensus with One Faulty Process” — proved that in an asynchronous distributed system with even a single crash-failure, no deterministic protocol can guarantee consensus. The proof constructs a scheduling adversary who can always extend any execution to delay decision indefinitely.This is the foundational impossibility theorem for consensus, and it shapes the entire field: any practical consensus protocol must either weaken one of the FLP assumptions (allow randomization, assume partial synchrony, accept probabilistic termination) or violate one of FLP’s guarantees (give up termination or give up agreement). Paxos, Raft, and PBFT all assume partial synchrony in practice and accept that during prolonged asynchrony, they may stall rather than violate safety.Inference: when evaluating a consensus protocol or a system that depends on one, identify which FLP assumption is being weakened. The choice of weakening determines the failure modes: a randomization-based protocol fails with low probability but may not terminate; a partial-synchrony protocol terminates when the network behaves well but may stall during partitions.
Kleppmann’s chapter on consistency and consensus is the modern textbook treatment of the distributed-systems consensus problem: multiple nodes, each of which may fail or be slow, must agree on a single value despite asynchronous message delivery and the possibility of node failures. The chapter walks through total order broadcast, the FLP impossibility result, and the practical protocols (Paxos, Raft, Zab) that work around impossibility under realistic failure models.Its value for the catalog is the integrative framing: Kleppmann shows that distributed-systems consensus is structurally the same operation that parliamentary voting, jury verdicts, two-phase commit, and Byzantine-fault-tolerant aviation systems perform — a set of independent agents converging on a shared decision under uncertainty about each other’s states. The book made the distributed-systems vocabulary accessible to working engineers and is the most-cited reference in industry discussion of consensus protocols since its 2017 publication.
Leslie Lamport’s The Part-Time Parliament (1998) introduces Paxos via an allegory about an imagined Greek parliament whose members come and go. The paper was famously rejected on its first submission for being too unusual, then republished after Paxos’s influence on real distributed systems became apparent. Paxos was the first consensus protocol that worked correctly under realistic asynchronous-with-crash-faults assumptions while still terminating during periods of network synchrony.Paxos established the foundational vocabulary still in use: proposers, acceptors, learners, ballots, quorums, prepare/promise/accept/accepted phases. Every subsequent consensus protocol (Raft, PBFT, Zab, Multi-Paxos, Fast Paxos) is either a variant, a simplification, or an extension of Paxos’s basic structure. Paxos and Raft assume non-malicious failures (crashes and message loss) and tolerate up to f failures with 2f+1 nodes; PBFT extends the framework to tolerate Byzantine (arbitrary, including malicious) failures with 3f+1 nodes. These protocols have direct analogues in voting theory (quorum rules, majority decisions) and social choice theory (Arrow’s theorem and its consequences for aggregating preferences). The consensus literature also shaped blockchain protocol design — Nakamoto’s proof-of-work consensus is structurally distinct (probabilistic, open-membership) but addresses the same underlying problem under different assumptions.Inference: foundational protocols often appear in unusual or hard-to-read forms because the inventor is still working out the language. The structural contribution survives the exposition. When a paper is hard to read but practitioners report that the underlying idea is sound, invest the reading effort — the structure may not be available in any cleaner form yet.
Diego Ongaro and John Ousterhout’s 2014 USENIX ATC paper introduced Raft, a consensus algorithm explicitly designed to be more understandable than Paxos. The paper opens by arguing that Paxos’s reputation for being hard to teach and implement was not an unavoidable cost of the consensus problem — it was a design choice. Raft decomposes consensus into three sub-problems (leader election, log replication, safety) and uses strong leadership and a contiguous-log structure to simplify each.The contribution to the catalog: Raft demonstrates that the same structural requirements (single agreed-upon value across a quorum, fault-tolerance, linearizability) can be met by markedly different expositions. Raft has displaced Paxos in many production systems (etcd, Consul, TiKV, CockroachDB) not because it’s more correct but because it’s more teachable.Inference: when you observe that a foundational primitive has earned a reputation for being notoriously hard to understand, ask whether the difficulty is intrinsic to the problem or accidental to a particular formulation. A re-decomposition may produce the same guarantees with vastly less cognitive load on implementers.
Roberts Rules, parliamentary majority votes, supermajority thresholds for constitutional changes.
peer review + independent replication is a slow-tempo Byzantine-tolerant consensus protocol; the “quorum” is loose and the protocol takes decades.
the simplest distributed-transactions protocol; coordinator-driven consensus with a known blocking failure mode (coordinator-down).