Human Heterogeneity in Agentic Markets
Delegating a decision to an AI agent does not strip out human difference — it transmits it. In a controlled study of AI-mediated negotiations, identical models pursuing identical objectives produced widely dispersed outcomes, and most of that dispersion traced to the human who wrote the agent's instructions rather than to the model itself. The prompt carried the signal. That result is the empirical foundation for persona conditioning: how a buyer is specified is what determines how it behaves.
Key takeaways
Delegation preserves human difference — and widens it
A study pairing buyer- and seller-side agents in an incentivized negotiation held the model and the objective constant across hundreds of participants, then measured how much outcomes varied. Roughly 73% of the variation in how surplus was divided traced to individual fixed effects — the specific human who authored each agent's prompt — not to the stochasticity of the model. AI-mediated negotiations also showed 16.5% higher outcome variance than a matched human-to-human benchmark. Delegation did not compress human difference into an efficient equilibrium. It amplified it. For buyer simulation, this is the core validating result: how a persona is instructed is the dominant driver of how it behaves, far above model-level noise.
The prompt is a transmission mechanism for identity
Instructions are not neutral. They encode the author's beliefs, assumptions, and behavioral tendencies, and those priors propagate into the agent's strategy. The study replicated canonical behavioral patterns in agentic form: personality traits shaped agent behavior, and selection on principal characteristics produced sorting — even though the agent never saw the principal's demographics. eLLMo's persona conditioning is built on this exact property. A persona defined by personality calibration, purchase motivation, and risk posture produces systematically different reasoning than an undifferentiated prompt, because the specification carries identity into the agent's behavior.
Machine fluency is a new, measurable form of human capital
Observable characteristics — demographics, Big Five personality, risk and time preference — explained roughly 17% of the variation in agent performance. The majority remained unexplained, which the authors attribute to machine fluency: the latent skill of aligning an agent to your objective through natural language. Machine fluency predicts who captures value in an agent-mediated market, the way literacy and numeracy once sorted economic outcomes. eLLMo productizes this skill on the buyer side. Persona specification is a disciplined method for instructing an AI buyer so its behavior maps to a defined segment, instead of drifting toward the model's defaults.
Social norms erode under delegation
Human negotiators cluster on the fair split: about 35% of human-to-human deals landed on a 50/50 division of surplus. Under agent mediation, that focal point collapsed to roughly 14%. Norms act as guardrails that compress variance and discipline extreme outcomes, and delegation weakens them — pushing markets toward winner-take-all dispersion. The simulation implication is direct: AI buyers do not default to the polite, expected, norm-smoothed answer a survey respondent offers. They surface sharper reactions, which is exactly what makes them useful for finding where a page actually breaks.
Specification hazard replaces information asymmetry
Classic principal-agent theory worries about hidden effort — moral hazard. Agent delegation introduces a different friction the authors call specification hazard: the prompt-as-contract is incomplete, and the agent can optimize a proxy — surface plausibility, superficial safety, token probability — instead of the principal's true objective. The primary risk shifts from what the other party knows to whether your own agent is aligned to your goal. This is why eLLMo treats persona specification and model version as controlled instruments. A simulation is valid only when the agent is pursuing the buyer's objective, not a convenient approximation of it.
The experiment designed to produce sameness
The hypothesis that AI delegation homogenizes outcomes is intuitive and wrong. Researchers tested it directly by constructing a marketplace where every structural source of outcome variation was sealed off. Participants — 299 in the agentic arm, 304 in the human-to-human benchmark — each wrote two sets of instructions: a buyer strategic playbook and a seller strategic playbook for the same transaction, a 2020 Toyota Camry LE in a blue-book range that left exactly $4,000 of surplus to divide. Every participant received identical transaction information. Every agent ran on the same underlying model. Researchers inserted each participant's prompt into a uniform template, then paired buyer agents against seller agents in a round-robin tournament over up to 12 rounds.
This is called induced-values methodology: the objective is not up for interpretation, it is handed to every participant in the same words. Outcome dispersion cannot come from differing goals or differing information. It can only come from the prompts. If the AI-mediation thesis were correct — that a shared model smooths individual differences into a tightly clustered distribution — this was the experiment where that thesis would win.
It lost.
73%: the number that settles the question
A fixed-effects decomposition of the surplus-division results attributed approximately 73% of the variation to individual buyer and seller fixed effects — the specific humans who authored the prompts, not any property of the model. The model's stochasticity contributed only a small remainder — a split that cannot be explained by anything other than the words people chose when they set up their agents.
The dispersion finding compounded the result. AI-mediated negotiations produced 16.5% higher outcome variance than the matched human-to-human benchmark. The efficiency hypothesis was not merely rejected — the data ran in the opposite direction. Delegation widened the spread of outcomes: one agent's instructions close near an even split while another's captures the overwhelming majority of the same $4,000. Same code, same objective. Different author.
Hold the model and the objective constant, and the instruction is what moves the result. Who writes the prompt matters more than which model runs it.
Foundation priors: how identity travels through language
The theoretical frame comes from the 'foundation priors' literature: models trained on human-generated text absorb the statistical regularities of human behavior, including negotiation postures, social positioning, and risk norms. When a person writes an instruction, they transmit their assumptions, their comfort with aggression, their implicit theory of how deals work, their tolerance for impasse, and their habits of framing. The model reads those signals and generates behavior that reflects them. The prompt is not a neutral pipe — it is a contract that leaks the author's identity into every round of the negotiation. Two instructions both aimed at maximizing surplus encode materially different strategic priors, and the agent executes them faithfully.
This mechanism explains the most counterintuitive finding in the study. In human-to-human selling, a well-documented gender gap appears: women realize worse results than men on the seller side. Under agent mediation, that gap reversed — women's agents outperformed men's on the same side, despite the agents having no access to demographic information. The only channel through which the reversal could occur was prompting behavior. A group-level disparity did not disappear under delegation; it transformed, traveling through the instruction layer in a form no prior framework anticipated.
Norms erode when no one is in the room
Social fairness norms do real work in human negotiation. They act as a coordination device that compresses outcomes and prevents extreme allocations. In the human-to-human benchmark, roughly 35% of deals landed on the fair 50/50 split. Under agent mediation, that share dropped to approximately 14%.
Agents are not aware of what counts as fair in human social terms, and the instructions they received did not encode that norm unless the author thought to include it. When no one is watching, the fairness focal point dissolves. The distribution drifted toward the extremes — a meaningful share of agentic negotiations left the seller with a near-zero share of the surplus. This is a property of delegation, not a failure of the model. Agents execute their instructions without the ambient social pressure that quietly disciplines human bargaining. In a buyer-simulation context, that property is useful: agents are less likely to return the smoothed, socially acceptable answer a human respondent defaults to when they want to avoid conflict.
Machine fluency: the skill that does not appear on a résumé
If the prompt is the dominant variable, the question becomes: what determines prompt quality? Researchers decomposed observable characteristics — demographics, Big Five personality, risk and time preferences, game-theoretic social behaviors, negotiation experience, and cognitive reflection — against outcome variation. All of those observables together explained approximately 17% of the variance. The remaining 83% was systematic but unmeasured.
Researchers named this residual 'machine fluency': the latent skill of articulating intent to a language model precisely enough that the agent pursues the principal's objective rather than an approximation of it. Machine fluency is real, consequential, unevenly distributed, and does not correlate reliably with the traits that historically predict performance. Two people with identical goals and access to the same model can get materially different results, because one specifies intent in a way the agent executes faithfully and the other produces an instruction the agent interprets as something adjacent. The skill is also task-dependent — participants instructed buyer agents more effectively than seller agents, which means heterogeneity in machine fluency varies by which side of the transaction the author is specifying.
Specification hazard: the new principal-agent friction
Classical agency theory focuses on moral hazard: hidden effort, incentive design, payment structure. That framework does not transfer to language-model agents. There is no payment mechanism. Alignment must be crafted entirely in language, and language is an incomplete contract.
The friction that replaces moral hazard is what researchers call 'specification hazard': the risk that the agent optimizes a proxy — fluent output, surface-level compliance with the instruction's literal words, generic helpfulness — rather than the principal's true objective. A seller agent instructed to 'get the best deal possible' may interpret that as 'avoid conflict,' 'match the tone of the buyer,' or 'close in the fewest rounds,' all of which are plausible readings of the sentence and all of which produce different outcomes. By the time specification hazard becomes visible, the deal is already done.
This shifts the locus of friction. The classic problem in two-party negotiation is information asymmetry between the parties. Specification hazard is a different asymmetry: between a principal and their own agent. Market distortion no longer requires one party to deceive the other. It requires only that one principal specified their agent more precisely than the other. Simulation validity research covers how this failure mode presents in buyer-simulation contexts — specifically, the case where plausible-looking output and valid output are indistinguishable without calibration data.
What this means for eLLMo
The 73% fixed-effects result is the buyer-simulation thesis demonstrated in a controlled adjacent domain. When the model and objective are held constant, the prompt moves outcomes by a margin that dwarfs model noise. eLLMo's persona specifications — encoding personality, purchase motivation, risk posture, category experience, price sensitivity, and trust baseline — are the buyer-side analog to the negotiation playbooks in the study. A well-specified persona prompt produces durable, identity-linked behavior, not a random draw from the model's prior.
Three operating disciplines follow directly. Persona specification is the dominant lever — not a configuration setting but the product itself. Vague specification produces vague signal. The model must be pinned and disclosed: specification hazard is the simulation-validity problem stated in economic language, and an unaligned agent returns output that looks valid while measuring a proxy. Specification discipline is a repeatable craft, not a one-time decision, because machine fluency is task-dependent and the gap between a well-specified persona and a vaguely specified one is the gap between a diagnostic that surfaces real buyer friction and one that returns plausible noise.
The norm-erosion finding adds a fourth implication. Because agents adhere less to the 50/50 fairness focal point, they are less likely to return the answer a human respondent thinks is polite to give. An AI buyer panel surfaces where a page actually fails — which is what makes it useful for identifying friction that human surveys systematically underreport. See agent-to-agent commerce research for how norm erosion compounds when both sides of a transaction delegate.
The deeper commercial implication is about defensibility. If value capture in agentic markets is governed by an unobservable prompting skill, then the disciplined construction of buyer personas is not a commodity input. It does not improve automatically with a better underlying model or a lower API price. It is the proprietary layer — the compounding asset that grows more accurate with every simulation, more differentiated with every feedback cycle, and harder to replicate the longer calibration data accumulates.
Methodology note
Built on the research. Designed for decisions.
eLLMo simulation surfaces ranked friction patterns across calibrated buyer personas — specific findings, traceable to buyer segments, actionable on the same day. The methodology is grounded in peer-reviewed research on AI agent behavior and OCEAN psychometrics. The output is a prioritized list of what to fix before your campaign launches — and why it matters for each buyer type.