Graceful degradation
Description
Graceful-degradation is the move of designing a system so that when a component or dependency fails, the system continues with reduced functionality rather than total failure. The diagnostic shape: there’s a full-capability path that runs in steady state; there’s a failure-detection mechanism; there’s a pre-designed degraded path that takes over when failure is detected; users (or callers) are informed they’re in degraded mode so they can adapt their behavior. The degraded path is itself a path, not an exception — it’s a first-class operating mode, not “the system is broken and we caught the exception.” The structural insight is that total-failure-on-any-component-failure is a design choice, not an inevitability. Most systems implicitly accept total failure because the degraded path was never designed. Graceful-degradation makes the degraded mode a design deliverable — what’s the read-only fallback if the write path is down? what’s the stale-data response if the freshness path is down? what’s the cached-result response if the live-compute path is down? The cheap-default is full capability; the fallback is degraded but still functional. Graceful-degradation depends on two upstream concepts: bulkheads (so a single failure doesn’t take everything down — there must be a meaningful “partial” to fail to) and circuit-breakers (so failure is detected quickly and the system actually switches modes rather than hanging on the failing dependency).Triggers
User-initiated: User describes a system that’s currently all-or-nothing and wants partial-failure tolerance, or asks about fallback paths. Vocabulary cues: “graceful degradation,” “fallback,” “reduced functionality,” “degraded mode,” “read-only mode,” “partial failure,” “resilience.” Agent-initiated: Engine notices a system that depends synchronously on a component whose failure would propagate as total system failure, with no designed-in fallback. Candidate inference: “this wants graceful-degradation — what’s the cheap default, what’s the failure signal, and what’s the pre-designed degraded path?” Situation-shape signals: Synchronous dependency on a critical component; user-facing impact of dependency failure is unacceptably broad; observed cascading failures; capacity-overload scenarios that need a “shed load gracefully” mode.Exclusions
- Correctness-critical operations where degraded output is worse than no output — bank transfers, medical-record writes, security-critical operations; failing-loud is correct; degrading-silently is dangerous.
- No meaningful “degraded” mode exists — if the system’s purpose is exactly one indivisible thing (e.g., authentication: either it works or you’re locked out), there’s no useful degradation point.
- Designed-fallback would be misleading — sometimes a degraded result that looks full-capability is worse than failure; the communication-to-consumers part of the concept is load-bearing here.
- All-or-nothing regulatory or contractual obligations — sometimes the contract says “no service if you can’t deliver full service”; degradation isn’t an option.
Structure
Relationships
- bulkhead — pre-requisite substrate; bulkheads create the failure domains that degradation operates within.
- circuit-breaker — provides the failure-detection trigger.
- cost-cascade — structural pattern is conditional cascade from cheap-default to designed-fallback.
- saga — degradation applied to multi-step transactions.
- caching — caches as the stale-fallback path during origin failure.
Examples
Netflix read-only mode · computer-science
Netflix read-only mode · computer-science
Hospital triage / mass-casualty incidents · medicine-and-health
Hospital triage / mass-casualty incidents · medicine-and-health
Airline overbooking + compensation · transportation
Airline overbooking + compensation · transportation
Allspaw, J., various incident-response and resilience-engineering writing — graceful degradation as a sociotechnical pro · computer-science
Allspaw, J., various incident-response and resilience-engineering writing — graceful degradation as a sociotechnical pro · computer-science
Banks during ATM-network outages · business
Banks during ATM-network outages · business
Beyer et al., *Site Reliability Engineering* (Google SRE Book, 2016), especially Chapters on failure modes and load shed · computer-science
Beyer et al., *Site Reliability Engineering* (Google SRE Book, 2016), especially Chapters on failure modes and load shed · computer-science
Browser progressive enhancement · computer-science
Browser progressive enhancement · computer-science
CDN serving stale-on-error · computer-science
CDN serving stale-on-error · computer-science
Cederholm, D., *Bulletproof Web Design* (2007) — progressive enhancement as positive design principle. · computer-science
Cederholm, D., *Bulletproof Web Design* (2007) — progressive enhancement as positive design principle. · computer-science
GitHub during incidents · computer-science
GitHub during incidents · computer-science
Don Norman, *The Design of Everyday Things* (rev. ed. 2013), Ch. 5 "Human Error? No, Bad Design" — designing for error. · computer-science
Don Norman, *The Design of Everyday Things* (rev. ed. 2013), Ch. 5 "Human Error? No, Bad Design" — designing for error. · computer-science
Military fallback positions · military-sciences
Military fallback positions · military-sciences
Nygard, M., *Release It!* (2007), Chapter 5 — the canonical stability-patterns essay; SRE practices (Google SRE Book, Beyer et al. 2016); Hello Interview primer on resilience patterns. · computer-science
Nygard, M., *Release It!* (2007), Chapter 5 — the canonical stability-patterns essay; SRE practices (Google SRE Book, Beyer et al. 2016); Hello Interview primer on resilience patterns. · computer-science
circuit-breaker, bulkhead, rate-limiting, retry-with-backoff) each targeting a specific cascade-failure mode. When designing for resilience, name the failure mode being targeted before picking the pattern; the catalog’s structural overlap among these concepts is curatorially meaningful.progressive enhancement / graceful degradation in web development (Cederholm, Bulletproof Web Design); HCI literature on error-tolerant interaction · computer-science
progressive enhancement / graceful degradation in web development (Cederholm, Bulletproof Web Design); HCI literature on error-tolerant interaction · computer-science
Safe-mode boots in operating systems · computer-science
Safe-mode boots in operating systems · computer-science