Skip to main content
biology computer-science library-and-museum-studies psychology

Caching

Description

Caching is the move of keeping a close-to-consumer copy of frequently-accessed data so reads don’t have to traverse the full distance to the authoritative source every time. The diagnostic shape: there’s a source whose access is slow or expensive; there’s a consumer whose access is hot; a copy lives between them, addressed by key, populated on miss (or eagerly), and invalidated by some policy. The structural payoff is latency reduction and load reduction on the source. The structural cost is staleness — the cache may not reflect the source’s current state — and the operational cost of the invalidation policy. The famous quip is exact: “There are only two hard problems in computer science — cache invalidation and naming things” (Phil Karlton). The reason cache invalidation is hard is that it lives at the seam between two systems with different change-propagation guarantees; the cache promises an abstraction (looks-like-the-source) that the world keeps leaking through. Caching is structurally adjacent to replication. A read-replica IS a cache of the primary with eventual-consistency semantics; the engineering tradeoffs (staleness vs throughput, invalidation policy, TTL choice) are isomorphic. The vocabulary differs by tradition (databases say “replica,” web infrastructure says “cache,” CPUs say “L1/L2/L3”), but the structural primitive is one.

Triggers

User-initiated: User describes slow access to a source, repeated identical queries, or proposes adding a cache. Vocabulary cues: “cache,” “caching,” “CDN,” “memoization,” “TTL,” “cache miss.” Agent-initiated: Engine notices repeated identical reads against a slow / expensive source. Candidate inference: “this wants caching — what’s the key, the TTL, and the invalidation policy?” Situation-shape signals: Slow / expensive source with hot access pattern; tolerable staleness in the consumer; read-heavy workload; reduceable latency dominates the user experience.

Exclusions

  • Write-heavy workloads — caches help reads; writes still hit the source, and the cache’s invalidation overhead can dominate.
  • Strict-consistency requirements — if staleness is unacceptable, a cache without strong synchronous invalidation is wrong; the alternative may be no-cache or compare-on-read.
  • No locality of reference — if every key is accessed exactly once, the cache miss rate is 100% and the cache adds latency.
  • Small dataset, small source — if the source is already fast, the cache adds operational complexity without latency benefit.

Structure

Internal structure of caching: a table of its component slots and the concepts that fill them. = a source (slow / expensive / canonical) + a faster-cheaper container (the cache) + an invalidation policy (TTL / write-through / explicit-purge / version-stamp). The cache is the eager pre-computed copy of an underlying lazy operation; the invalidation policy is what bounds the staleness.

Relationships

Relationship neighborhood of caching: a graph of the concepts it connects to and the concepts it is a part of.
  • eager-vs-lazy — cache is the eager copy; invalidation makes it lazy at staleness boundary.
  • leaky-abstraction — cache invalidation is the canonical leak.
  • replication — read-replicas are caches by another name.
  • grain — cache key IS the grain.
  • load-balancing — caches reduce load on the source by absorbing hits at lower levels.

Examples

CPU caches (L1/L2/L3) · computer-science

the literal computer-architecture instance; access cost differs by 100x across levels.

Library reserve shelves · library-and-museum-studies

high-demand books copied to the reserve desk for fast access; the library is the source, reserve is the cache.
free-floating amino-acid-charged tRNAs as a cache over ribosomal synthesis; cells maintain “warm” tRNA pools for fast translation.
cache-control + ETag + If-Modified-Since are the invalidation protocol.
A content-delivery network replicates a small subset of a website’s content across a globe-spanning fleet of edge servers, each positioned to serve users in its geographic neighborhood. When a user requests a page, their request goes to the closest edge node; if that node has the response cached, the user gets it in tens of milliseconds without the request ever traversing the public internet to the origin. The origin is the authoritative source; the edge nodes are caches; the cache-control HTTP headers (Cache-Control, ETag, Vary, surrogate-control extensions) are the invalidation policy.The structural shape is caching at planetary scale. Latency reduction is the headline benefit — a user in Singapore reading content hosted in Virginia experiences a round-trip dominated by the speed of light, not by application logic — but the load-shedding benefit is equally important: the origin only handles cache misses, so a viral story whose 99% of reads are served from edges sees its origin load capped at the marginal-content-update rate rather than the user-traffic rate. The invalidation problem (when did the origin change? how do we tell the edges?) is the canonical hard part: long TTLs maximize cache-hit ratios but extend the window during which stale content can be served; aggressive purges restore consistency but defeat the cache.CDN vendors compete on the engineering details of this tradeoff: regional vs. multi-tier edge hierarchies, dynamic content compilation at the edge, programmable purge policies, ESI/edge-side composition, signed-URL invalidation. The structural primitive is one; the implementations vary by which dial each vendor optimizes hardest.
Postgres shared_buffers, MySQL query cache (historically; now removed), application-level result caching.
local resolver cache + browser cache + OS cache; multi-level cache hierarchy with TTL-based invalidation.
Hennessy and Patterson’s Computer Architecture: A Quantitative Approach (Morgan Kaufmann) is the standard graduate text on the subject, and its treatment of the memory hierarchy is the rigorous engineering account of caching. A CPU cache is a small, fast, expensive store holding a copy of the most recently and frequently used data from large, slow, cheap main memory. It works because of locality of reference: temporal locality (recently accessed data is likely to be accessed again) and spatial locality (data near recently accessed data is likely to be needed soon). The book quantifies the whole design space — hit rate, miss penalty, the L1/L2/L3 hierarchy, write policies — turning “add a cache” from intuition into a measurable tradeoff.This is caching in its definitional form: a close-to-consumer copy of frequently-accessed data that trades staleness (or, in the cache-coherence case, the cost of keeping the copy consistent) for speed, with main memory remaining the authoritative source. The cache holds a derived view; correctness depends on the canonical store. The structure generalizes directly to every other tier — browser caches, CDNs, database query caches, application memoization — all of which are the same move at a different distance from the consumer: keep a fast local copy of authoritative data that lives somewhere slower, and pay for it in the currency of consistency.
a 7±2-item cache over long-term memory; cognitive-science instance with similar invalidation problems (interference, decay).
The most-quoted line about caching is Phil Karlton’s: “There are only two hard things in Computer Science: cache invalidation and naming things.” Karlton was a veteran architect at Netscape (and earlier Xerox PARC, SGI, DEC); the quip is attributed to his time there in the mid-1990s and was reported and popularized by his former colleague Tim Bray. (The widely-seen three-item version that adds “off-by-one errors” is a later rider, credited to Leon Bambrick — not part of Karlton’s original.)The joke is load-bearing for understanding the concept. Caching’s structure is “a derived copy of authoritative data, kept close to the consumer.” Creating that copy is easy; the hard part is invalidation — knowing when the authoritative source has changed so the derived view must be refreshed or discarded. The difficulty is intrinsic to the structure, not incidental: the moment you hold a copy of something that can change elsewhere, you have signed up for the problem of detecting that change in time. Every cache trades staleness for speed, and invalidation is the name of the bill. That a throwaway one-liner became the field’s folk wisdom reflects how reliably this consistency cost ambushes engineers who reached for a cache thinking only of the speed.
deep computer-science lineage; the cache pattern is foundational from CPU architecture through web infrastructure through application-level memoization
Martin Kleppmann’s Designing Data-Intensive Applications (2017) has become a standard modern reference for distributed-systems data design. The early chapters treat caching as one move in a wider design space of derived data — denormalized views, search indexes, and materialized aggregations — all sharing the structural shape of a close-to-consumer derived view backed by an authoritative source elsewhere. The book is careful to surface the tradeoff that defines caching specifically: staleness in exchange for latency and load.The framing’s contribution is positioning caching as one cell in the broader derived-data taxonomy rather than as a special-case optimization, which makes it easier to see when a cache is the right tool versus when (for instance) a search index or an event-sourced view would fit better.
Volume 3 of Knuth’s The Art of Computer Programming, subtitled Sorting and Searching, supplies the algorithmic substrate that every cache and memo table is built on. Its Chapter 6 (“Searching” — which Knuth notes could equally be called “Storage and Retrieval of Information”) gives the rigorous analysis of hashing, search trees, and table look-up: the data structures that answer “have I already computed/fetched this, and if so where is it?” quickly. A cache is, mechanically, a lookup table keyed by a request; its performance is governed by exactly the hash-function quality, collision behavior, and time-space tradeoffs Knuth formalizes.It is worth being precise about lineage, because the structural point depends on it: Knuth provides the retrieval machinery, but the term memoization — caching a function’s results so repeated calls avoid recomputation — was coined separately by Donald Michie (“‘Memo’ Functions and Machine Learning,” Nature, 1968). The general idea of storing computed results to skip redundant work is older still (Samuel’s checkers program, 1959). Holding these apart matters for the concept: caching is one structure (a fast derived copy of authoritative data) realized over a substrate (efficient search/lookup) with a long, separately-attributable history. Knuth is the foundation of the substrate; he is not the origin of the word.
canonical web-application pattern; cache-aside is the default.
RFC 7234 (2014, part of the HTTP/1.1 revision set) is the canonical protocol-level specification of HTTP caching semantics. It defines the directives (Cache-Control, Expires, ETag, Last-Modified, Vary) and the freshness/validation model that govern how clients, intermediaries (proxies, CDN edges), and origin servers cooperate to serve cached responses safely.What makes the spec interesting as an instance of the caching primitive is that it explicitly codifies the staleness contract — the conditions under which a cached response may be served without revalidation, must be revalidated, or must be treated as expired. The protocol moves “trades staleness for speed” from an implicit engineering tradeoff into an interoperable, machine-readable agreement between all the caches in the path between origin and consumer.