computer-science psychology

Cost cascade

Description

A conditional cascade where a cheap-default path handles the common case and an expensive-fallback path handles the remainder — with the switch between them conditioned on the outcome of the cheap path (failure, low confidence, threshold not met). The cascade’s value comes from the cost differential: if the expensive path were always used, it would either be rate-limited, budget-exhausted, or simply too slow; the cheap path acts as a filter that directs only the hard cases to the expensive path. Cost-cascade differs from rivals-into-router in that the routing condition is defensive rather than proactive: the cheap path runs first and the expensive path is invoked because the cheap path fell short, not because a routing signal predicted the expensive path was appropriate. The cascade is sequential (cheap, then conditionally expensive); rivals-into-router is parallel dispatch based on predicted fit. The concept also differs from plain asymmetric-gate: asymmetric-gate is a boundary with differential cost in each direction; cost-cascade is a pipeline where stages have increasing cost and each stage gates entry to the next.

Triggers

User-initiated: User is designing a multi-tier system or expressing a cost/budget concern: “this is too slow/expensive to run every time,” “we want a fast path for the common case,” “can we avoid the expensive call unless necessary?” Agent-initiated: Engine detects a decision context where a single expensive operation is the proposed solution to a problem that could be solved by a cheap operation in most cases. Candidate inference: “this is a cost-cascade candidate — can we put a cheap filter in front of the expensive operation and escalate only when the cheap path falls short?” Vocabulary cues: “fast path,” “fallback,” “escalate,” “try X first,” “if cheap fails,” “tiered,” “cascade,” “common case,” “edge case,” “rate limit,” “budget,” “expensive only when necessary,” “two-stage.” Situation-shape signals: A proposed operation with high per-call cost that handles a mix of easy and hard cases. The easy cases could be handled cheaply; the hard cases require the expensive call. The concept is indicated when the proportion of easy cases is meaningfully large (otherwise the cascade overhead exceeds the savings).

Exclusions

When the cheap path’s overhead exceeds its savings — if the cheap path is expensive to run relative to what it saves (e.g., the classifier is nearly as slow as the expensive call), the cascade doesn’t pay off. This is a calibration check, not a failure of the concept.
When there’s no meaningful cost differential — if the cheap and expensive paths have similar latency and cost, cost-cascade adds complexity without benefit.
When the common case requires the expensive path — if most cases are “hard” (the cheap path rarely handles them), the cascade reduces to “always use the expensive path with an extra round-trip overhead.”
When the quality differential matters more than the cost — sometimes you want the expensive path’s quality even on easy cases (e.g., for consistency or auditability). Cost-cascade trades quality uniformity for cost savings.

Structure

The gradient identifies the quality-vs-cost dimension along which the cascade operates. The asymmetric-gate sets the escalation condition (cheap path below threshold → escalate). Stack-layer captures the layered structure: cheap layer handles the common case; expensive layer is invoked only when the cheap layer’s output is insufficient.

Relationships

Relationship neighborhood of cost-cascade: a graph of the concepts it connects to and the concepts it is a part of.

rivals-into-router — specialization relationship — cost-cascade is a specialization of rivals-into-router where the routing condition is defensive (cheap path failed/insufficient) rather than proactive (signal predicts which branch to use).
asymmetric-gate — composition relationship — the escalation condition is an asymmetric gate: below threshold, stay on cheap path; above threshold, pay the expensive path’s cost.
gradient — composition relationship — the quality-vs-cost tradeoff is a gradient; the cascade’s threshold is a choice of where to draw the line on that gradient.
stack-layer — composition relationship — the cascade forms a stack: cheap layer below, expensive layer above. The cheap layer handles what it can; the expensive layer handles what the cheap layer can’t.
multi-channel-ingest — composition relationship — when multiple input channels feed the same store, cost-cascade determines which channel’s data gets the expensive processing treatment (high-volume cheap channel vs. low-volume expensive channel).
uniformity-dividend — composition relationship — a cost-cascade that applies uniformly across all query types earns less dividend than one calibrated to the actual proportion of easy vs. hard cases.

Examples

Gemini Flash → Pro · computer-science

Flash handles common-path queries (fast, cheap); Pro handles escalated queries (slower, expensive). The cascade condition: Flash confidence below threshold or query tagged as complex → escalate to Pro.

MAC/FAC model (Gentner et al.) — the two-stage retrieval architecture is the canonical cog-sci instance. · psychology

The MAC/FAC model — “Many Are Called / Few Are Chosen,” from Forbus, Gentner, and Law in cognitive-science work on analogical retrieval — proposes that human analogical memory works in two stages. The first stage (MAC) is a fast, cheap, surface-feature-based filter that pulls a wide candidate pool from long-term memory; the second stage (FAC) is a slow, expensive, structure-mapping-based selector that picks the best analog from the MAC-surfaced candidates.This is the canonical cog-sci instance of cost-cascade: a cheap-default stage handles the common case (rapidly narrowing from “everything in memory” to “plausibly relevant”), and an expensive-fallback stage runs only on the small post-filter set. The cost differential between the stages is the architectural value — surface-feature matching is cheap enough to run over thousands of candidates, but structure-mapping is expensive enough that running it over thousands would be infeasible. The same cost asymmetry recurs whenever cheap retrieval (e.g., embedding similarity) and expensive judging (e.g., LLM-based alignment + inference) sit in the same pipeline — the architecture that pays is “cheap-first, narrow the pool, then spend the expensive operations only on what survived.”

API tiering patterns in cloud services: free tier → paid tier escalation on quota. · computer-science

Most consumer-facing cloud APIs implement the cost-cascade pattern explicitly as tiering: a free tier handles the common case (small request volumes, basic features, generous-but-bounded usage) and a paid tier handles the remainder (high volumes, premium features, guaranteed throughput). The cascade is exposed to users — they see the boundary between tiers — but the structural shape is the same as internal cost-cascade architectures: a cheap-default path that succeeds for most use cases, and an expensive-fallback that absorbs the overflow when the cheap path’s budget is exceeded.The design pays off because the demand distribution for most services is long-tailed: a large majority of users stay comfortably within free-tier limits, and only a small minority of high-volume users need the expensive tier. The tiered pricing extracts revenue proportional to usage while keeping the friction low at the entry point. The same shape recurs internally: services maintain a cheap in-process cache (free-tier equivalent), fall back to a more expensive database query when the cache misses, and finally escalate to an off-cluster external lookup when the database itself doesn’t have the answer — each stage handles its share, with the cascade gating entry to the next.Inference: When designing pricing tiers or internal request-routing, the parameters to set are the threshold (when does the cheap path declare insufficient and escalate?) and the cost ratio (how much more expensive is the next tier?). A well-tuned cascade extracts most of the value from the cheap path and reserves the expensive path for cases that genuinely need it; a poorly-tuned cascade either over-escalates (wasted expensive capacity) or under-escalates (cheap-path failures bleed into user experience).

Embedding search → lexical search · computer-science

dense retrieval for semantic queries; BM25 fallback for exact-match or high-precision queries where dense underperforms. The cascade condition is retrieval confidence.

Viola, P., & Jones, M. (2001). "Rapid Object Detection using a Boosted Cascade of Simple Features." *Proceedings of CVPR 2001*; §4, "The Attentional Cascade." · computer-science

The Viola-Jones face detector is the textbook cost-cascade in machine learning. Object detection is a rare-event problem: nearly every sub-window of an image is background, and only a tiny fraction contains a face. So rather than run one expensive classifier on every window, Viola and Jones chain stages of increasing cost. The first stage is a “cheap default” — a classifier built from just a handful of Haar-like features, tuned to almost never reject a true face while cheaply discarding roughly half of the non-faces. Any window it rejects is dropped immediately; only survivors pass to the next, slightly more expensive stage, and so on through progressively costlier classifiers.The mapping to the concept is exact. The cheap default is the early, few-feature stage that runs on every window; the expensive fallback is the deep, many-feature stages reserved for windows that survive; the escalation gate is the per-stage rejection threshold, and it is asymmetric in precisely the concept’s sense — escalation (passing a window deeper) is costly, while de-escalation (rejecting it) is free and terminal. The value is entirely in the cost differential: because the cheap stages eliminate the overwhelming majority of windows, the average number of features evaluated per window drops to a handful even though the full detector contains thousands, which is what made real-time detection possible on 2001 hardware.Inference: when the common case is “no” and “no” is cheap to establish, front-load a cheap rejector tuned for near-zero false negatives — the win is not better accuracy but the expensive path almost never running.

Rate-limited APIs · computer-science

cheap API tier handles most requests; premium tier handles overflow or high-priority traffic. The cascade is on rate-limit signal rather than quality signal.

Test suite tiering · computer-science

fast unit tests run on every commit; slow integration tests run on escalation condition (e.g., changes to core modules).

​Cost cascade

​Description

​Triggers

​Exclusions

​Structure

​Relationships

​Examples