Collider-bias
Description
Collider-bias is the causal-inference failure mode in which conditioning on a variable that two others jointly cause induces a spurious association between those two causes. The diagnostic shape is A → B ← C: A and C both cause B (the collider), and the act of restricting to, stratifying by, or statistically controlling for B opens a path between A and C that does not exist in the full population. The association is not in the world; it is manufactured by the conditioning. Four roles compose the shape. The two causes A and C may be independent (or only weakly related) in the full population. The collider B is the common effect into which both causal arrows collide. The conditioning is the analytic or selection act of fixing B — the move that opens the spurious path. The induced association is the resulting A–C relationship, present only within the conditioned stratum, and very often a negative one: this is the “explaining-away” pattern, where, given that the effect occurred, evidence that one cause is present makes the other cause appear less necessary. The concept is best understood as the exact mirror of confounding. Confounding is the common-cause structure A ← C → B, where a real-in-the-data spurious association is removed by conditioning on the common cause. Collider-bias is the common-effect structure A → B ← C, where a spurious association is created by conditioning on the common effect. The two together are the foundation of causal-inference literacy, and they are dangerous precisely because the same instruction — “control for that variable” — is the cure for one and the cause of the other. The only way to know which you face is to know the variable’s role in the causal DAG, which the data alone cannot tell you. The canonical instance is Berkson’s paradox: among hospitalized patients, two unrelated diseases can appear negatively associated, because being hospitalized (the collider) is caused by having either disease, so the population conditioned on hospitalization over-represents people with exactly one of them. The same shape recurs far outside epidemiology — the apparent negative correlation between talent and likability among already-famous people (fame is a collider on both), the “why are the attractive people I date so rude” observation (dating-willingness is a collider on attractiveness and personality), and the artifactual disappearance of a real effect whenever an analyst “controls for” a downstream consequence of the treatment.Triggers
User-initiated: User reports a surprising association — often a negative one — that appears within a selected or stratified group, or asks whether “controlling for” some variable could have created rather than removed an effect. Vocabulary cues: “Berkson,” “collider,” “explaining away,” “selecting on the outcome,” “we controlled for it and the effect vanished/appeared.” Agent-initiated: Agent notices an analysis conditioning on (selecting, stratifying, or regressing on) a variable that is plausibly a common effect of the two variables of interest. Candidate inference: “is this variable a collider? Conditioning on it may have manufactured the association — check whether it’s a common effect rather than a common cause or a mediator.” Situation-shape signals: Any case-control or sample-restricted analysis. “Controlling for” a variable that happens downstream of the treatment. Surprising anticorrelations within elite/selected groups. The “explaining-away” reasoning pattern in diagnosis or attribution.Exclusions
- Common-cause structure (A ← C → B) — when a third variable causes both observed variables, the spurious association exists in the full population and is removed by conditioning. That is confounding, the exact mirror.
- Genuine causal association (A → B) — when the two variables really are causally linked, the association is not an artifact and conditioning is not the culprit. Collider-bias is specifically a manufactured association.
- Sampling that does not condition on a common effect — survivorship, non-response, and self-selection distort who is in the sample but need not run through a collider. The broader selection-bias family includes collider-conditioning as one mechanism; collider-bias is specifically the common-effect case.
- Conditioning on a mediator or confounder, not a collider — controlling for a mediator blocks part of a real effect; controlling for a common cause closes a backdoor. Only controlling for a common effect opens a spurious path. Misclassifying the node’s DAG role is the central error.
Structure
Relationships
- confounding — the exact causal-graph mirror: common cause (A ← C → B, association removed by conditioning) vs common effect (A → B ← C, association created by conditioning). The foundational causal-inference pair; “control for it” is the cure in one and the disease in the other.
- selection-bias — collider-bias is the causal-DAG mechanism behind a large class of selection-bias cases (Berkson’s bias, high-performer selection, hospital case-control). Selection-bias is the symptom family; collider-bias is the specific common-effect mechanism.
- simpsons-paradox — collider-conditioning is one of the structures that produces a within-stratum sign flip; unlike the confounding case, the corrective is to not condition.
Examples
Berkson, J., "Limitations of the Application of Fourfold Table Analysis to Hospital Data" (Biometrics Bulletin, 1946, vol. 2, no. 3, pp. 47–53) · statistics
Berkson, J., "Limitations of the Application of Fourfold Table Analysis to Hospital Data" (Biometrics Bulletin, 1946, vol. 2, no. 3, pp. 47–53) · statistics
Hernández-Díaz, S., Schisterman, E. F., & Hernán, M. A., "The Birth Weight Paradox Uncovered?" (American Journal of Epidemiology, 2006, vol. 164, no. 11, pp. 1115–1120) · medicine-and-health
Hernández-Díaz, S., Schisterman, E. F., & Hernán, M. A., "The Birth Weight Paradox Uncovered?" (American Journal of Epidemiology, 2006, vol. 164, no. 11, pp. 1115–1120) · medicine-and-health
Pearl, J., "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference" (Morgan Kaufmann, 1988) · computer-science
Pearl, J., "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference" (Morgan Kaufmann, 1988) · computer-science