Load balancing
Description
Load-balancing is the move of distributing incoming work across a pool of workers via a routing function so that no single worker becomes the bottleneck and the aggregate system can absorb load that exceeds any single worker’s capacity. The diagnostic shape: incoming work arrives at a single dispatch point (the load balancer); the dispatch function chooses a worker; the chosen worker processes; the response (if any) returns directly or via the balancer. The router is the always-traversed hub; the workers are interchangeable (or specialized via affinity). The structural payoff is horizontal scale and graceful failure handling: adding a worker increases capacity; removing one (planned or unplanned) loses 1/N capacity but doesn’t break the system. The structural cost is the router itself — it’s a single point of failure unless replicated, it’s a routing-decision cost on every request, and it forces choices (round-robin? least-connections? consistent-hash?) whose tradeoffs depend on workload. Load-balancing is structurally adjacent to sharding: both partition work, but sharding partitions persistently by data-key (one shard owns one slice of data), while load-balancing partitions transiently by routing decision (any worker can handle any request, except under affinity policies). When the routing function is “hash the key persistently,” load-balancing IS sharding. When the routing function is “round-robin across stateless workers,” load-balancing is the more general case.Triggers
User-initiated: User describes capacity ceiling on a single worker, wants horizontal scale, or proposes adding a load balancer. Vocabulary cues: “load balancing,” “load balancer,” “round robin,” “consistent hashing,” “health check,” “reverse proxy.” Agent-initiated: Engine notices a single worker hitting capacity, or a system that wants to scale by adding more replicas without redistributing data. Candidate inference: “this wants load-balancing — what’s the routing function (uniform / sticky / hash), and what’s the health-check policy?” Situation-shape signals: Single-worker capacity ceiling; need for horizontal scale; need for failover with continued service; workers are mostly-interchangeable (stateless or near-stateless).Exclusions
- Stateful workers with workload-specific affinity — if every request must hit the same worker (in-memory session state, GPU model loaded in worker memory), load-balancing reduces to single-worker pinning, and the routing layer adds cost without earning benefit.
- Heterogeneous workers where one is much faster — uniform load-balancing wastes the fast worker’s headroom; you need weighted routing instead, but that adds complexity.
- Workloads that aren’t actually load-bound — if you’re latency-bound (downstream dependency is the bottleneck) and not capacity-bound, adding more workers just adds idle workers.
- Tiny scale where a single worker suffices — premature load-balancing adds infrastructure for no payoff.
Structure
Relationships
- uniformity-dividend — even distribution earns the dividend; uneven distribution loses it.
- sharding — sharding is load-balancing with a persistent key-to-worker mapping.
- multi-hop-routing — load balancer is a routing hop.
- bulkhead — per-pool load-balancing is bulkheading. - health-check (not yet a concept; implicit) — load-balancing requires worker-health visibility.
Examples
DNS round-robin · computer-science
DNS round-robin · computer-science
Traffic-cop lane assignments · transportation
Traffic-cop lane assignments · transportation
Biological cell cycle · biology
Biological cell cycle · biology
Consistent hashing (Karger 1997) · computer-science
Consistent hashing (Karger 1997) · computer-science
CPU thread scheduler across cores · computer-science
CPU thread scheduler across cores · computer-science
Karger et al., *Consistent Hashing and Random Trees* (STOC 1997) — the foundational paper for consistent-hash load-balan · computer-science
Karger et al., *Consistent Hashing and Random Trees* (STOC 1997) — the foundational paper for consistent-hash load-balan · computer-science
Leonard Kleinrock, *Queueing Systems, Volume 2: Computer Applications* (John Wiley & Sons, 1976) — queueing-theoretic analysis of computer and communication systems. · mathematics
Leonard Kleinrock, *Queueing Systems, Volume 2: Computer Applications* (John Wiley & Sons, 1976) — queueing-theoretic analysis of computer and communication systems. · mathematics
Kleppmann (2017), Designing Data-Intensive Applications, Chapter 5; Karger et al. (1997) "Consistent Hashing and Random Trees"; Hello Interview primer on load balancers · computer-science
Kleppmann (2017), Designing Data-Intensive Applications, Chapter 5; Karger et al. (1997) "Consistent Hashing and Random Trees"; Hello Interview primer on load balancers · computer-science
Library checkout desk routing · business
Library checkout desk routing · business
MapReduce shuffle · computer-science
MapReduce shuffle · computer-science
queueing theory (Kleinrock); networking literature (HAProxy, nginx, F5, AWS ALB documentation); operations research · computer-science
queueing theory (Kleinrock); networking literature (HAProxy, nginx, F5, AWS ALB documentation); operations research · computer-science
Shift scheduling · business
Shift scheduling · business
Verma et al., *Large-scale cluster management at Google with Borg* (EuroSys 2015) — load-balancing at planetary engineer · computer-science
Verma et al., *Large-scale cluster management at Google with Borg* (EuroSys 2015) — load-balancing at planetary engineer · computer-science