Skip to main content
computer-science economics

Principal agent

Description

A delegation relationship where the principal hires the agent to act on the principal’s behalf, but the agent has their own incentives that may not align with the principal’s outcomes. The structural shape is delegation + misalignment + information-asymmetry. The misalignment is the gap between what the agent wants and what the principal wants; the asymmetry is what enables the gap to express as observable bad outcomes (the agent can act on their own incentives because the principal can’t fully observe what they’re doing). The concept generalizes far beyond economics. AI alignment is principal-agent: humans (principal) delegate work to AI systems (agent), and the alignment problem is making the agent’s objective actually be the principal’s outcome. Voter-politician, client-lawyer, patient-doctor, employer-employee, donor-charity — any delegation with imperfect observation has the same shape. Distinct from context-asymmetry alone: context-asymmetry is the visibility gap (deliberate or accidental); principal-agent is what happens when that visibility gap is paired with incentive misalignment in a delegation relationship.

Triggers

User-initiated: User describes a delegation relationship with concerns about whether the delegate is acting in the delegator’s interest, or asks about incentive design. Vocabulary cues: “principal-agent,” “misaligned incentives,” “agency cost,” “moral hazard,” “oversight,” “alignment.” Agent-initiated: Agent notices a delegation relationship where the delegate has discretion and the delegator has imperfect visibility. Candidate inference: “are the incentives aligned; what alignment mechanism exists; what’s the residual agency cost?” Situation-shape signals: Hiring, outsourcing, board-management relationships, contractor arrangements. AI systems acting under-spec. Any time a principal delegates without full observation. Incentive-design discussions.

Exclusions

  • Incentives are intrinsically aligned — when the agent independently wants the principal’s outcome (true mission alignment, family member acting on behalf), the concept’s tension dissipates. Common but fragile.
  • Full observability — when the principal can fully observe the agent’s actions and outcomes in real-time, the information-asymmetry substrate is absent; misalignment can be corrected immediately.
  • Self-employment / no delegation — solo work has no principal-agent gap to manage; the concept requires the delegation relation.
  • Aligned-but-incompetent — when the agent wants to deliver but lacks capability, the problem is capability not alignment; the concept mischaracterizes the failure mode.

Structure

Internal structure of principal-agent: a table of its component slots and the concepts that fill them.

Relationships

Relationship neighborhood of principal-agent: a graph of the concepts it connects to and the concepts it is a part of.
  • context-asymmetry — principal-agent rides on context-asymmetry; the asymmetry is the substrate and the misalignment is what makes it harmful.
  • hoist-by-own-petard — the principal who delegates without aligning incentives builds their own petard; common pattern in startup founder-VC and shareholder-executive cases.
  • doctrine — alignment mechanisms (contracts, governance, oversight, compensation structure) are doctrines that bound principal-agent risk.
  • trigger-rule-pair — performance metrics + payout rules are trigger-rule-pairs designed to align agent action with principal outcome; poor metric design is “wrong trigger” (Goodhart).
  • load-bearing — the alignment mechanism is load-bearing; remove it and the misalignment expresses as observable bad outcomes.

Examples

Shareholders and executives · economics

classic Jensen & Meckling case: stock options, board oversight, fiduciary duty are mechanisms to align the agent’s incentives with the principal’s.

AI alignment · computer-science

humans delegate work to AI systems; the AI’s optimization target may diverge from the human’s actual goal (specification gaming, reward hacking, deceptive alignment).
billable-hour structure misaligns: lawyer benefits from billable hours, client benefits from quick resolution; flat-fee or contingency are alignment mechanisms.
donor wants outcomes; charity has institutional incentives (overhead, growth); program-evaluation and impact reporting are alignment mechanisms.
broad case: incentive pay, OKRs, equity grants, oversight all try to align agent action with principal outcome.
Bengt Holmström’s 1979 Bell Journal of Economics paper formalized the moral-hazard problem within principal-agent contracting under unobservable action. The agent chooses an effort level that the principal cannot directly observe; only the noisy outcome is verifiable. The paper established the informativeness principle: an optimal contract should condition the agent’s compensation on every variable that carries information about the agent’s action, even if those variables are not directly under the agent’s control. The result determined the structure of subsequent agency contracts (relative performance evaluation, multi-signal compensation, options vs salary mixes) and provided the theoretical scaffolding for what became the modern executive-compensation literature.Inference: Holmström’s contribution is the canonical formalization of the observability problem that gives principal-agent its operational bite — the misalignment between principal and agent matters specifically because the principal cannot directly see the agent’s effort, only the outcome that the agent’s effort jointly produces with noise. The informativeness principle is the design move that comes out of the framework: compensate based on every signal correlated with effort, not just outcome. The 1990 Holmström-Milgrom multi-task extension then sharpened the problem further (when the agent’s effort allocates across multiple dimensions, the contract designed for one outcome distorts effort away from the others). Holmström received the 2016 Nobel Prize (jointly with Oliver Hart) for this body of work.
Jensen and Meckling’s 1976 Journal of Financial Economics paper “Theory of the Firm: Managerial Behavior, Agency Costs and Ownership Structure” applied principal-agent analysis to the corporation. They reframed the firm not as a unified production function (the Coasean / neoclassical treatment) but as a nexus of contracts among parties with divergent interests — shareholders as principals, managers as agents, with creditors and other claimants in additional principal-agent relationships layered on top. Within this frame, the paper identified and named agency costs: the sum of monitoring expenditures by the principal, bonding expenditures by the agent, and the residual loss from imperfect alignment that no realistic contract can fully eliminate.Inference: Jensen-Meckling is the citation that brought principal-agent theory to bear on corporate governance and made it a load-bearing primitive in finance and management research. The agency-cost framework supplied the language for analyzing capital structure (why issuing equity dilutes managerial incentives), executive compensation (why options align incentives more strongly than salary), board oversight (monitoring expenditure), and corporate-control markets (the threat of takeover as an agency-cost-reducing mechanism). Recognizing the paper’s place in the catalog locates the analytical move: shifting attention from “what does the firm produce?” to “what are the contractual relationships inside the firm and what residual misalignment do they leave?”
fee-for-service rewards procedure-count; capitation rewards keeping patients healthy; payment structure shapes the alignment.
Stephen Ross’s 1973 American Economic Review paper “The Economic Theory of Agency: The Principal’s Problem” is the founding contemporary treatment of the principal-agent problem as a formal economic analysis. Ross framed the question as one of optimal contracting: given a principal who wishes some outcome and an agent whose action produces that outcome (with uncertainty), what is the structure of the compensation contract that aligns the agent’s incentive with the principal’s preference, subject to the agent’s participation and incentive constraints? The paper established the analytical scaffolding — utility functions for each party, the information structure, the participation and incentive-compatibility constraints — that subsequent agency-theory papers (Holmström 1979, Grossman-Hart 1983, Holmström-Milgrom 1991) built on.Inference: Ross’s paper is the citation-of-record for the principal-agent problem as a unified analytical primitive rather than a collection of domain-specific cases. The contribution was less in solving a particular contract design than in giving a language and a problem statement that organized work in insurance, labor contracting, sharecropping, partnership structures, and (later) corporate governance. The structural primitive the catalog wants — delegator + delegate + misaligned incentives + information asymmetry → optimal contracting problem — is exactly what Ross named, and the subsequent half-century of agency-theory literature is the elaboration of that frame.
Stuart Russell’s 2019 book Human Compatible: Artificial Intelligence and the Problem of Control explicitly frames the AI alignment problem as a principal-agent problem at civilizational scale. Russell argues that the dominant approach in AI to date — specifying an objective function the system optimizes — is structurally identical to a principal handing an agent a fixed contract without provision for misalignment between what was specified and what the principal actually wants. The proposed alternative, inverse reward design and cooperative inverse reinforcement learning, makes the AI system explicitly uncertain about the principal’s true preference and treats observations of human behavior as evidence the AI should use to update its inferred preference rather than as direct objective specification.Inference: Russell’s reframing is one of the cleanest worked cases of the principal-agent primitive lifting cross-domain. The economic agency-theory literature supplies decades of analysis on information asymmetry, incentive compatibility, residual misalignment, and the limits of contracting — all directly applicable to AI alignment once the analogy is made explicit. The corollary is that AI alignment inherits the economics literature’s pessimism: agency costs in principal-agent contracts are never zero (the residual loss is a structural feature, not a contracting failure), which suggests that some level of misalignment between AI systems and their principals will persist even with the best alignment techniques. Russell’s contribution is to make the lift explicit and to direct AI-design research toward agency-theory-informed mechanisms.
voters delegate policy; politicians have own incentives (re-election, donor relationships, ideology); elections + transparency are partial-alignment mechanisms.