Skip to main content
computer-science economics mathematics philosophy psychology statistics

Wisdom of crowds

Description

Aggregation of independent estimates produces accuracy exceeding any individual estimate. The classical case: Francis Galton (1907) reported that the median of 787 visitors’ guesses of an ox’s weight at a livestock fair was 1,197 lbs — within one pound of the actual 1,198 lbs, despite most individual guesses being wildly off. The structural shape: many estimators + independence constraint + aggregation + emergent accuracy. The diagnostic property — independence is constitutive, not optional — distinguishes wisdom-of-crowds from any random-aggregation move. Once estimators see each other’s values, they anchor; the variance no longer averages out because the bias is now correlated. This is why prediction markets that allow trading-on-prices work less reliably than sealed independent estimates; why juries are sequestered; why scientific peer review at scale requires multiple independent reviewers who don’t see each other’s drafts.

Triggers

User-initiated: User describes aggregating independent estimates, ensemble methods, prediction markets, “polling,” “crowd-sourced answer.” Vocabulary cues: “wisdom of crowds,” “independent estimates,” “aggregation,” “Galton,” “ensemble,” “prediction market,” “polling.” Agent-initiated: Agent notices a system with multiple independent estimators producing better aggregate accuracy than any individual. Candidate inference: “is the independence really preserved here, or are estimators implicitly anchoring on each other?” Situation-shape signals: Discussions of prediction-market design. Jury-sizing decisions. Ensemble ML architectures. Survey methodology. Anywhere “many independent estimates” + “single aggregate answer” is the structural shape.

Exclusions

  • Correlated estimators — when estimators share training, information sources, or visible past estimates, independence breaks. “Crowd” prediction averages of analysts who all read the same source are not wisdom-of-crowds; they’re consensus-by-anchoring.
  • Systematic bias across the population — if every estimator is biased the same way (e.g., political-affiliation effects on factual estimation), aggregation amplifies bias rather than canceling noise. The concept requires the bias to be uncorrelated.
  • Tasks requiring expertise rather than aggregation — surgical decisions, deep-domain analysis: the median of non-expert guesses is not better than the expert’s single estimate. Wisdom-of-crowds works when individual error is high but uncorrelated, not when most estimators are uniformly worse than the best.
  • Adversarial / strategic estimation — if estimators have incentives to misreport (e.g., wash trading in prediction markets), the independence + truth-telling assumption breaks.

Structure

Internal structure of wisdom-of-crowds: a table of its component slots and the concepts that fill them.

Relationships

Relationship neighborhood of wisdom-of-crowds: a graph of the concepts it connects to and the concepts it is a part of.
  • group-mind — structural opposite on the member-to-member relation axis; wisdom-of-crowds requires independence, group-mind requires coordination. The pair illuminates that “leveraging many people” works via two distinct mechanisms that have opposite constitutive constraints.
  • emergence — both produce something better than individuals via collective dynamics; wisdom-of-crowds is emergence specifically via aggregation of independent inputs.
  • redundancy — independent estimates ARE redundancy at the cognition layer; same noise-reduction mechanism as ECC + DNA repair.
  • anchoring — anchoring is the canonical failure mode for wisdom-of-crowds; once estimators see each other’s values, independence breaks and the effect collapses.
  • network-effect — contrast: network-effect’s value grows with participation count and coordination (more users see each other’s actions); wisdom-of-crowds’ value grows with participation count and independence. Different growth mechanisms with opposite constitutive requirements on observability.

Examples

Galton's ox-weight estimation (1907) · statistics

the canonical case; median of 787 independent guesses landed within 1 lb of the true 1,198 lb weight.

Prediction markets · economics

when participants trade based on private information without coordination, market prices converge to accurate probability estimates (Iowa Electronic Markets; sports betting markets; corporate prediction markets).
Audience-poll lifelines (Who Wants to Be a Millionaire) — independent audience members polling at ~91% accuracy on factual questions; much higher than any individual member.
many independent security researchers find what one focused team misses; the independence is what produces coverage breadth.
The Marquis de Condorcet’s 1785 Essai contains what is now called the Condorcet Jury Theorem: if each member of a jury has a probability p > 0.5 of independently arriving at the correct verdict on a binary question, then the probability that the majority arrives at the correct verdict approaches 1 as the jury size grows. Conversely, if p < 0.5, the majority’s accuracy approaches 0. The theorem is the formal mathematical foundation of the wisdom-of-crowds phenomenon: aggregation of independent above-chance estimators converges to the truth; aggregation of independent below-chance estimators converges to the falsehood.The theorem’s constitutive assumptions are exactly the structural requirements that the wisdom-of-crowds concept catalogs as constitutive: independence (each juror reaches their judgment without influence from others) and better-than-chance individual accuracy. Both assumptions are non-trivial — when jurors deliberate jointly, independence breaks and the theorem’s prediction no longer holds; when the average juror is below chance, the theorem actually predicts the worsening of accuracy with crowd size. The theorem cleanly separates the conditions under which aggregation is helpful from the conditions under which it is actively harmful.Inference: The structural prediction “more estimators improves accuracy” is conditional on two specific properties — independence and above-chance individual accuracy. When designing systems that rely on aggregation (juries, ensemble methods, prediction markets, peer review), the question is not whether to aggregate but whether the constitutive assumptions actually hold. If either fails, aggregation magnifies the failure mode rather than averaging it out.
random forests, gradient-boosted trees, model averaging: each model trained independently on different data subsets; the aggregate prediction beats any individual model.
under independence + competence > 50%, larger juries converge to correct verdicts.
Scott Page’s The Difference (2007) provided the formal mathematical apparatus behind the wisdom-of-crowds claim, sharpening it past the popular intuition. The book’s central result — the Diversity Prediction Theorem — decomposes the squared error of a crowd’s mean prediction into exactly two terms: the average individual error (how wrong each predictor typically is) and the prediction diversity (how different the predictors’ guesses are from each other). The identity is exact, not approximate: crowd error = average individual error − prediction diversity. The implication is that prediction diversity subtracts from crowd error directly, so a crowd of more-diverse-but-individually-less-accurate predictors can outperform a crowd of less-diverse-but-individually-more-accurate predictors, provided the diversity dominates the accuracy gap. Page extended the analysis to problem-solving (heterogeneous teams outperform homogeneous teams of higher-skilled individuals on hard problems, under specifiable conditions) and connected the formalism to organizational diversity, collective intelligence, and ensemble methods in machine learning.Inference: The theorem makes the wisdom-of-crowds claim operational and falsifiable: it specifies exactly what conditions are required for aggregation to outperform individuals, and equally specifies when it won’t. The conditions are (1) reasonable individual accuracy on average, and (2) genuine cognitive diversity — different models, different information, different heuristics. If either condition fails, the aggregate is no better than the average. The failure mode that masquerades as wisdom-of-crowds — averaging predictions from analysts who all read the same source and arrived at similar conclusions — has high average accuracy but near-zero diversity, so the diversity term contributes nothing and the crowd does no better than any individual. The design corollary is that building a wise crowd is the act of cultivating diversity, not the act of finding more or smarter individuals; ensemble ML, prediction markets that protect independence, and deliberately heterogeneous review panels all encode this insight structurally.
Ensemble methods in machine learning are wisdom-of-crowds at the level of model predictions. A random forest (Breiman, 2001) trains many decision trees, each on a bootstrap sample of the data and a random subset of features at each split, then averages (or majority-votes) their predictions. No individual tree is particularly accurate — many are deliberately weak, fit on partial views — but their aggregate is markedly better than any single tree’s, because the independent-ish errors partially cancel. Gradient-boosted trees (Friedman, 2001) achieve a related effect via sequential weak learners, each correcting the residual of the previous ensemble.The wisdom-of-crowds structure shows up explicitly: aggregation of many partially-correlated estimators outperforms any single estimator, and the design choices that protect independence between the estimators (bootstrap sampling, random feature subsets, stochastic gradient updates) are constitutive of the gain. Practitioners who fail to randomize across trees lose most of the ensemble’s benefit; the trees become correlated, their errors stop cancelling, and the forest collapses toward the accuracy of any single tree.Inference: The lesson Galton’s ox-weight crowd offers for ML practitioners is operational: when an ensemble underperforms, audit whether the ensemble members are actually independent. Correlated weak learners are not a crowd; they are one learner replicated.
independent replication is what makes consensus epistemically load-bearing; correlated labs (same training, same equipment) defeat the independence.
when votes aren’t visible until cast, aggregation converges to high-quality answers; voting after seeing existing votes degrades accuracy via anchoring + herding.
James Surowiecki’s 2004 The Wisdom of Crowds is the modern popularization that consolidated the wisdom-of-crowds concept across prediction markets, polling, juries, ensemble forecasting, and corporate decision aggregation. The book traces examples from Galton’s 1907 ox-weight contest through twentieth-century experimental work to contemporary prediction markets (Iowa Electronic Markets, corporate internal markets) and demonstrates the same structural pattern across the cases: aggregation of many independent estimates outperforms the typical individual estimator, often dramatically.Surowiecki’s contribution is the explicit cataloguing of what wisdom of crowds requires to operate. He identifies four constitutive conditions: diversity of opinion (each estimator has private information or a distinct interpretation); independence (estimators are not influenced by each other’s views before reporting); decentralization (estimators draw on local knowledge); and a mechanism to aggregate the private judgments into a collective answer. Failure of any of the four typically breaks the effect; the most common failure mode is loss of independence as estimators begin to coordinate on each other’s signals, producing herd behavior rather than aggregation.Inference: When implementing a wisdom-of-crowds mechanism in a real institutional setting, the design work is largely about protecting the constitutive conditions against the natural tendency of human deliberation to erode them. Sealed ballots, parallel committees, blind reviews, and prediction-market designs that obscure other participants’ positions are all institutional moves to preserve independence and diversity against the social pressures that destroy them.
Philip Tetlock’s 2005 Expert Political Judgment reports the results of a two-decade longitudinal study of forecasting accuracy among professional political and economic experts. Across roughly 28,000 predictions from 284 experts, Tetlock found that average expert accuracy on geopolitical and economic forecasts was barely above chance, and that more confident experts were systematically less accurate than less confident ones (the famous “foxes-vs-hedgehogs” distinction). Crucially, simple statistical-aggregation models and even uninformed extrapolation often outperformed the individual experts.The result is structurally significant for wisdom-of-crowds in two ways. First, it confirms that aggregated forecasts can outperform individual expert judgment under realistic conditions — even when the aggregators are statistically simple and the experts are domain-credentialed. Second, it identifies the conditions under which the aggregation advantage is largest: high uncertainty, long time horizons, complex causal structures with many degrees of freedom — exactly the cases where individual cognition is most prone to overconfident pattern-imposition. The wisdom-of-crowds effect is largest where individual expertise is least reliable, which is a structurally consistent prediction.Inference: When considering whether to rely on a credentialed individual expert versus an aggregated forecast (a prediction market, a survey of forecasters, an ensemble model), the diagnostic is the base-rate accuracy of individual experts in the relevant domain. In domains where individual expert track records are poor — most geopolitical, economic, and long-horizon forecasting domains — the aggregation advantage is substantial, even when each individual contributor is well-informed.