Skip to main content
computer-science education family-and-consumer-science law linguistics psychology

Few shot

Description

A small set of examples teaches the pattern. The agent generalizes from demonstration rather than from explicit description — “watch this and that, then do the next one.” The structural shape is N examples + implicit pattern + new query: the pattern is never stated explicitly; it’s induced from the examples’ commonalities. This makes few-shot powerful (humans and models can learn things they couldn’t be told) and brittle (the examples have to share the load-bearing pattern, not just the surface). The diagnostic question — “do the examples share a structural shape, or just surface features?” — separates productive few-shot from cargo-cult-by-example. If the examples share only superficial features (formatting, vocabulary, length), the agent generalizes the wrong invariant; the new output looks right at surface level but misses the structural point.

Triggers

User-initiated: User provides examples and asks “do another like this,” or asks “how can I teach the agent to do X?” with example data. Vocabulary cues: “few-shot,” “examples,” “like these,” “in the style of,” “demonstration.” Agent-initiated: Agent notices a task where explicit rule-articulation is hard but examples are available. Candidate inference: “provide N examples that share the structural shape; the implicit pattern will transfer.” Situation-shape signals: Tasks where rules are easier to demonstrate than to state. Documentation that includes examples (most well-written documentation does). Teaching contexts where the learner is meant to induce from cases.

Exclusions

  • Tasks better served by explicit instructions — when a rule can be stated cleanly, stating it beats demonstrating it; few-shot can introduce variance.
  • Examples don’t share the right pattern — if the demonstrations don’t actually exemplify the target shape, the model generalizes the wrong invariant.
  • One example is enough (or zero) — some tasks need only an instruction; piling on examples adds noise.
  • Adversarial / edge-case heavy — when the failure modes are at the edges, few-shot of typical examples teaches the wrong central tendency; explicit edge-case discussion is better.

Structure

Internal structure of few-shot: a table of its component slots and the concepts that fill them.

Relationships

Relationship neighborhood of few-shot: a graph of the concepts it connects to and the concepts it is a part of.
  • seeding — few-shot IS seeding applied to in-context generation; examples are the seed that shapes interpretation.
  • cargo-cult — contrast: few-shot fails when examples teach only surface; the model cargo-cults the format without the structure.
  • shape — few-shot trades on shared shape across examples; if shape varies between examples, the model has nothing to pattern-match.
  • doctrine — sometimes few-shot is the right primitive; sometimes a doctrine (explicit rule) is. The choice between them is a meta-level call.
  • chain-of-thought — few-shot + CoT = examples with reasoning traces; teaches the process, not just input-output mapping.

Examples

GPT-3 and successor LLMs · computer-science

few-shot in-context learning as the canonical capability: model performance on a new task scales with the number of examples provided.

Cooking recipes vs cooking shows · family-and-consumer-science

recipe tells; show demonstrates. Few-shot is the show.
surgical training, blacksmithing, musical instruction: master demonstrates, apprentice repeats, repeat across cases.
Lave and Wenger’s Situated Learning reframed apprenticeship not as the transfer of context-free rules from master to novice but as legitimate peripheral participation — the apprentice begins by performing simple, real, peripheral tasks alongside more-experienced practitioners and gradually moves toward central, complex contributions. The crucial epistemic claim is that the knowledge being acquired is not statable as a list of rules; it is induced from watching, doing, and being corrected on actual instances. Tailoring, midwifery, butchering, naval quartermastering — across their ethnographic cases, the pattern was consistent: the demonstrations teach the shape; the rule-articulation comes later, if at all.Inference: Few-shot prompting of language models is a mechanically much-cheaper instance of the same structural move — a small set of worked examples, juxtaposed without explicit rule articulation, induces a generalization in the model that explicit instruction would have to spell out (often imperfectly). The apprenticeship literature is the longer-running, human-centric body of evidence for the claim that demonstrations can teach what rules cannot. The corollary inherits too: when the demonstrations don’t share the load-bearing structural feature, the learner generalizes the wrong invariant — apprentices who imitate the surface of expert behavior without picking up the underlying judgment are the cross-domain analog of cargo-cult few-shot.
Bandura’s Social Learning Theory established that much of human behavioral acquisition occurs through observation of others rather than through direct reinforcement of one’s own trial-and-error attempts. The Bobo doll experiments and the broader observational-learning literature documented that children, presented with a small number of adult demonstrations of a behavior, would reproduce the structural pattern (including novel variations) without being explicitly rewarded for doing so. The cognitive subsystem inferred from the demonstrations was richer than imitation: attention to the model, retention of what was observed, motor reproduction adapted to the learner’s body, and motivational selection of which observed behaviors to reproduce.Inference: The structural shape — a small number of demonstrations induces a generalization in the observer — is the same primitive as few-shot in-context learning, and the four-stage cognitive subsystem (attention, retention, reproduction, motivation) is a useful lens for diagnosing few-shot failures. When few-shot prompting fails, the analog of the diagnosis is: were the examples attended to (positioned and formatted so the model’s attention covers them); were they retained coherently across the context (not flushed by intervening content); does the model have the reproduction capability the examples implicitly demand; and is there sufficient motivational signal (instruction framing) that the demonstrated behavior is the desired one rather than an arbitrary stylistic option?
The GPT-3 paper demonstrated that a sufficiently large language model could perform new tasks from a handful of in-context examples in the prompt, without any gradient updates to the model’s weights. The paper’s contribution was empirical: as model scale increased, the gap between zero-shot and few-shot performance grew, and few-shot generalization emerged as a property of scale.The result instantiated few-shot teaching in a substrate (frozen pretrained model + prompt-time conditioning) very different from the apprenticeship and language-acquisition substrates the shape originally came from — yet the structural move was the same: a small set of demonstrations stands in for an explicit rule, and the learner extrapolates the pattern. The portability of the shape across substrates is what makes “few-shot” worth catalouging as a primitive rather than a model-training detail.
the reviewer shows the right version next to the wrong version; the pattern transfers without being explicitly named.
children learn grammar from positive examples without explicit grammar instruction.
Pinker, The Language Instinct — children’s grammar acquisition from examples.