Cost cascade
Description
A conditional cascade where a cheap-default path handles the common case and an expensive-fallback path handles the remainder — with the switch between them conditioned on the outcome of the cheap path (failure, low confidence, threshold not met). The cascade’s value comes from the cost differential: if the expensive path were always used, it would either be rate-limited, budget-exhausted, or simply too slow; the cheap path acts as a filter that directs only the hard cases to the expensive path. Cost-cascade differs from rivals-into-router in that the routing condition is defensive rather than proactive: the cheap path runs first and the expensive path is invoked because the cheap path fell short, not because a routing signal predicted the expensive path was appropriate. The cascade is sequential (cheap, then conditionally expensive); rivals-into-router is parallel dispatch based on predicted fit. The concept also differs from plain asymmetric-gate: asymmetric-gate is a boundary with differential cost in each direction; cost-cascade is a pipeline where stages have increasing cost and each stage gates entry to the next.Triggers
User-initiated: User is designing a multi-tier system or expressing a cost/budget concern: “this is too slow/expensive to run every time,” “we want a fast path for the common case,” “can we avoid the expensive call unless necessary?” Agent-initiated: Engine detects a decision context where a single expensive operation is the proposed solution to a problem that could be solved by a cheap operation in most cases. Candidate inference: “this is a cost-cascade candidate — can we put a cheap filter in front of the expensive operation and escalate only when the cheap path falls short?” Vocabulary cues: “fast path,” “fallback,” “escalate,” “try X first,” “if cheap fails,” “tiered,” “cascade,” “common case,” “edge case,” “rate limit,” “budget,” “expensive only when necessary,” “two-stage.” Situation-shape signals: A proposed operation with high per-call cost that handles a mix of easy and hard cases. The easy cases could be handled cheaply; the hard cases require the expensive call. The concept is indicated when the proportion of easy cases is meaningfully large (otherwise the cascade overhead exceeds the savings).Exclusions
- When the cheap path’s overhead exceeds its savings — if the cheap path is expensive to run relative to what it saves (e.g., the classifier is nearly as slow as the expensive call), the cascade doesn’t pay off. This is a calibration check, not a failure of the concept.
- When there’s no meaningful cost differential — if the cheap and expensive paths have similar latency and cost, cost-cascade adds complexity without benefit.
- When the common case requires the expensive path — if most cases are “hard” (the cheap path rarely handles them), the cascade reduces to “always use the expensive path with an extra round-trip overhead.”
- When the quality differential matters more than the cost — sometimes you want the expensive path’s quality even on easy cases (e.g., for consistency or auditability). Cost-cascade trades quality uniformity for cost savings.
Structure
Relationships
- rivals-into-router — specialization relationship — cost-cascade is a specialization of rivals-into-router where the routing condition is defensive (cheap path failed/insufficient) rather than proactive (signal predicts which branch to use).
- asymmetric-gate — composition relationship — the escalation condition is an asymmetric gate: below threshold, stay on cheap path; above threshold, pay the expensive path’s cost.
- gradient — composition relationship — the quality-vs-cost tradeoff is a gradient; the cascade’s threshold is a choice of where to draw the line on that gradient.
- stack-layer — composition relationship — the cascade forms a stack: cheap layer below, expensive layer above. The cheap layer handles what it can; the expensive layer handles what the cheap layer can’t.
- multi-channel-ingest — composition relationship — when multiple input channels feed the same store, cost-cascade determines which channel’s data gets the expensive processing treatment (high-volume cheap channel vs. low-volume expensive channel).
- uniformity-dividend — composition relationship — a cost-cascade that applies uniformly across all query types earns less dividend than one calibrated to the actual proportion of easy vs. hard cases.
Examples
Gemini Flash → Pro · computer-science
Gemini Flash → Pro · computer-science
MAC/FAC model (Gentner et al.) — the two-stage retrieval architecture is the canonical cog-sci instance. · psychology
MAC/FAC model (Gentner et al.) — the two-stage retrieval architecture is the canonical cog-sci instance. · psychology
API tiering patterns in cloud services: free tier → paid tier escalation on quota. · computer-science
API tiering patterns in cloud services: free tier → paid tier escalation on quota. · computer-science
Embedding search → lexical search · computer-science
Embedding search → lexical search · computer-science
Viola, P., & Jones, M. (2001). "Rapid Object Detection using a Boosted Cascade of Simple Features." *Proceedings of CVPR 2001*; §4, "The Attentional Cascade." · computer-science
Viola, P., & Jones, M. (2001). "Rapid Object Detection using a Boosted Cascade of Simple Features." *Proceedings of CVPR 2001*; §4, "The Attentional Cascade." · computer-science
Rate-limited APIs · computer-science
Rate-limited APIs · computer-science
Test suite tiering · computer-science
Test suite tiering · computer-science