Skip to main content
computer-science education mathematics medicine-and-health psychology

Reflection

Description

An agent evaluates its own output and revises based on that self-critique. The generator and evaluator collapse into one actor playing two roles — first producer, then critic, then producer again. The structural shape is same-agent + two-roles + revise-on-self-feedback, distinct from external-critique loops (evaluator-optimizer with separate evaluator) because the same agent’s blind spots constrain what the self-assessment can catch. The diagnostic question — “can this agent notice its own failure modes from inside?” — is what makes reflection load-bearing or not. Reflection works for failure modes the agent can recognize (typos, arithmetic, structural inconsistency, missing context); it fails for blind-spot failure modes (the same biases that produced the error infect the self-critique). The concept is most powerful when paired with structured criteria — explicit checklist, framework, or distinct-prompting that puts the same agent in a deliberately-different cognitive mode.

Triggers

User-initiated: User asks for self-checking, sanity-checking, “did I get this right?” Vocabulary cues: “reflection,” “self-critique,” “review your work,” “sanity check,” “second pass,” “Reflexion.” Agent-initiated: Agent notices that its own output is at risk of catchable errors (typos, structural inconsistencies, missing constraints) and that an explicit re-pass would catch them. Candidate inference: “reflect against [explicit criteria]; flag anything that doesn’t pass.” Situation-shape signals: Multi-step plans before execution. Output that’s about to be sent to a downstream consumer. Long-running generations where consistency matters. Tasks with structured success criteria the agent can self-check against.

Exclusions

  • Blind-spot failure modes — when the agent’s biases produce both the error and the failed self-critique, reflection alone can’t catch it; external critique is required.
  • Trivial outputs — over-applying reflection to outputs where the iteration adds no value produces “reflection theater” that slows work without improving quality.
  • Time-bounded settings where iteration is impossible — real-time control, fast-response systems; the reflection loop’s cost exceeds its benefit.
  • No criterion to reflect against — reflection without explicit standards becomes “feels right” — the same intuition that produced the output, applied twice.

Structure

Internal structure of reflection: a table of its component slots and the concepts that fill them.

Relationships

Relationship neighborhood of reflection: a graph of the concepts it connects to and the concepts it is a part of.
  • evaluator-optimizer — reflection IS evaluator-optimizer in the same-agent case; the structural shape is identical, the failure modes differ.
  • feedback-loop — reflection is feedback-loop with the agent as both source and recipient; deliberately constructive (unlike hoist-by-own-petard).
  • doctrine — reflection’s effectiveness depends on having a doctrine for what to reflect against (explicit criteria, checklist, framework).
  • chain-of-thought — chain-of-thought exposes reasoning; reflection then evaluates the exposed reasoning. The two compose into “think → reflect on thinking → revise.”
  • trigger-rule-pair — reflection’s trigger condition is “you just produced output,” the rule is “now critique it against criterion X.”

Examples

Code self-review before submitting a PR · computer-science

author wears the reviewer’s hat; catches obvious problems before peer review.

Surgical checklists (Atul Gawande) · medicine-and-health

same-agent self-audit at procedural checkpoints; reflection institutionalized as procedure.
Anthropic’s “Building Effective Agents” engineering post (2024) catalogs reflection as one of the core agentic workflow patterns. The pattern: after a generator produces an output, the same (or another) LLM call critiques it against criteria, and the critique drives a revision. The post frames reflection as a simple but powerful pattern because it works on any task with checkable structure (code that should compile, plans that should satisfy constraints, writing that should match a style) and degrades gracefully when criteria are vague.The Anthropic framing is one of the standard reference points for engineers building agent loops — the reflection pattern shows up explicitly in tooling and in the design of multi-step agents that should self-correct rather than ship first-pass output. The catalog’s reflection concept treats this engineering pattern and the broader cognitive-science notion of metacognition as instances of the same structural shape.
In Anthropic’s agent-design framing, reflection is the pattern where an agent evaluates its own previous step before deciding the next one — the generator-critic loop collapsed into one actor. The reflection pass typically reads the just-produced output, runs an explicit check against criteria (does this compile, does it satisfy the constraints, does it match the requested style), and either ratifies the step or triggers a revision.The pattern is structurally identical to the broader evaluator-optimizer pattern but with the same-agent constraint: the actor doing the producing is the same actor doing the critiquing. The constraint is what makes reflection cheap (no inter-agent coordination) and what limits its ceiling (blind spots the agent has are invisible to its own self-critique).
John Flavell’s 1979 American Psychologist paper “Metacognition and cognitive monitoring” introduced metacognition as a research program: the study of how cognitive agents represent, monitor, and regulate their own cognitive processes. He distinguished metacognitive knowledge (what the agent believes about their own cognition — what they find easy, what strategies work for them) from metacognitive experiences (in-the-moment monitoring signals — the feeling of knowing, the sense that something is wrong) and from metacognitive strategies (the procedures the agent invokes to regulate their cognition, e.g., re-reading when comprehension fails).The framework’s structural contribution is the explicit two-level model: there is a first-order cognitive process (solving a problem, recalling an item, generating an answer) and a second-order monitoring process that watches the first and intervenes. Reflection is exactly this second-order operation applied to one’s own just-produced output.Inference: For any AI system asked to do reflection, the load-bearing question is whether the system’s monitoring loop has access to signals the generation loop does not. If the second pass uses the same model with the same context and no new evidence, the “reflection” is structurally indistinguishable from running the generator twice — Flavell’s framework predicts the cases where reflection helps are those where the monitoring layer has different signals (different criteria, different context, different prompt-frame) than the generator.
Atul Gawande’s The Checklist Manifesto documented how the WHO Surgical Safety Checklist (a 19-item checklist run at three points around any operation) cut major surgical complications by more than a third across the eight hospitals where it was tested. The checklist did not introduce new knowledge — every item asked about something the operating team already knew to verify — but it forced a structured pause at which the team had to publicly re-verify each item, making implicit checking explicit and discoverable. Gawande generalized the lesson across high-stakes domains where complex action under time pressure routinely produces preventable errors.The structural move is the conversion of internal self-checking (each person silently re-verifying their own work) into externalized procedure (the checklist as a shared artifact, the verbal call-and-response as an audit log). Reflection moves from a private cognitive act to an observable, institutional one.Inference: When self-reflection on critical work fails systematically — when an expert agent skips the second pass under time pressure, or convinces themselves it is unnecessary because they “know” the answer — the corrective is not to exhort harder reflection but to externalize it into a procedure that cannot be silently skipped. A checklist run aloud, a post-mortem template, a structured PR review form all do the same work: they convert reflection from optional internal practice into mandatory external artifact.
“does this match boundary conditions? does it agree with the special case?”
automated reflection: catch easy-to-miss issues before they propagate.
LLM agents with explicit verbal-reinforcement reflection; canonical AI-research articulation.
Schön’s The Reflective Practitioner is the canonical account of an agent evaluating and revising its own work mid-performance. Against the “technical rationality” model — where a professional simply applies a known formula to a problem — Schön argues that skilled practitioners across fields (architects, engineers, therapists, musicians) instead engage in reflection-in-action: when a situation produces a surprise that their tacit “knowing-in-action” did not predict, they pause to think about what they are doing while still doing it, in what Schön calls “a reflective conversation with the situation.” The practitioner makes a move, the situation “talks back,” and the practitioner adjusts the next move on the strength of that feedback. He distinguishes this from reflection-on-action, the retrospective analysis that happens after the work is finished.This is precisely the collapse of generator and evaluator into one actor. The same professional who produces the act also critiques it, in a different framing, and produces the revised act — and in reflection-in-action the loop runs inside the performance rather than after it. The carpenter adjusting the cut by feel, the jazz player harmonizing around a wrong note: in each case the assessment and the revision are interleaved with the original act, not bolted on afterward.Inference: When an act produces an unexpected result, treat the surprise as the trigger to reflect in action rather than only after it — reframe the problem on the spot, make a corrective move, and read the situation’s response as feedback for the next move. The strongest version of the reflection loop is the one Schön describes: a tight conversation between act, self-assessment, and revised act, closed before the performance is over rather than deferred to a post-mortem.
The Reflexion paper formalizes reflection as a language-agent primitive: an agent attempts a task, receives a (possibly sparse) feedback signal, generates a verbal self-critique of why it succeeded or failed, stores that critique as episodic memory, and then re-attempts the task conditioned on the prior critique. The critique is not a numerical reward but a natural-language reflection — “the route I took missed step 3 because I assumed the door was unlocked” — which the next attempt can read and act on. The paper’s contribution is operationalizing reflection for LLM agents in a learning-without-gradient-updates setting: the agent’s weights don’t change, but the in-context memory accumulates self-reflective traces that improve task performance across iterations. Across coding (HumanEval), reasoning (HotpotQA), and decision-making (AlfWorld) benchmarks, Reflexion-equipped agents outperform single-shot baselines by margins large enough to make verbal self-critique a default move in production agent harnesses.The result rhymes with Schön’s earlier The Reflective Practitioner (1983), which distinguished reflection-in-action (mid-task adjustment as a hallmark of expert practice) from reflection-on-action (post-task review); Reflexion implements the latter as an explicit loop in the agent’s control flow. Both extend a much older lineage in cognitive science (Flavell’s 1979 metacognition work) and engineering practice (post-mortem reviews, code self-review before submission, mathematicians re-deriving their own proofs).Inference: A reflection loop earns its keep only when the agent has an explicit criterion to reflect against — a test case that failed, a checklist item, a structural invariant, a downstream evaluator’s signal. Reflection without criteria collapses into “feels right” applied twice; reflection with criteria is the structural fix for catchable errors that the generator alone produces. The Reflexion result is also a warning: gains plateau on tasks whose failure modes are blind spots of the same model, because the critic shares the generator’s biases. The remedy is external critique (the evaluator-optimizer pattern with a different evaluator), not more rounds of self-reflection.