Research Area 08

The Productivity Compression and the Pre-Spend Advantage

AI compresses the time it takes to do skilled work, and the harder the task, the larger the compression. Analyzing 100,000 real conversations with an AI assistant, Anthropic estimated that tasks which would take about 90 minutes unaided were completed roughly 80% faster with AI assistance, with the steepest gains on the most cognitively demanding work. Pre-spend buyer simulation applies that same compression to the slowest, most expensive part of conversion work: finding out what breaks before you pay for the traffic that finds out for you.

Key takeaways

AI collapses task time on skilled work

Anthropic's analysis of real-world usage estimated that a task requiring about 90 minutes without assistance was completed roughly 80% faster with AI. The effect concentrated in knowledge work: time savings reached about 90% on healthcare-related tasks, 87% on document writing, and 80% on financial analysis. These are not marginal efficiencies. They are step-changes in the cost of producing a unit of skilled output.

The most complex work compresses the most

The speedup scaled with task difficulty. Tasks whose instructions implied a high-school level of education were completed about 9 times faster; tasks implying a college degree, about 12 times faster. The pattern matters for buyer research specifically. Reasoning through how a skeptical buyer reads a page — what they weigh, where they hesitate, what they misread — is exactly the kind of complex, judgment-heavy work that AI compresses most. Analysis that was too slow to run before every campaign is now tractable before every campaign.

Adoption could double labor-productivity growth

Scaled across the economy, the estimate is large. If current models were universally adopted over a decade, Anthropic projected US labor-productivity growth could rise by about 1.8 percentage points per year — roughly double recent rates. The macro case and the marketing case rhyme. The value is not the model doing one task once; it is the same judgment applied at a speed and scale that changes which decisions get informed before they are made.

The conversion-research cycle is a prime compression target

A/B testing is empirical and slow. It needs a live asset, weeks of traffic to reach significance, and a budget already committed to the test. That makes it a post-spend instrument: the insight arrives after the money. Buyer simulation moves the same diagnostic work upstream — a ranked list of friction points and the reasoning behind each, available the same day, before media is committed. The productivity gain is timing. The insight lands before the spend, not after it.

The return is decision latency, not just labor saved

Counting hours saved understates the effect. The larger return is compressing the loop between a question and a decision-grade answer — from a multi-week test cycle to a same-day read. eLLMo is built for that loop: submit a page or a creative concept, receive ranked friction findings traceable to specific buyer segments, and brief the fix before launch. The expensive version of this learning is a campaign that underperforms and a post-mortem that cannot reconstruct why.

How the estimate was built

Most productivity claims about AI start with a survey: ask people how much time they saved, average the answers, publish the number. This study took the opposite route. Anthropic analyzed 100,000 real conversations with an AI assistant, using a privacy-preserving pipeline called Clio to classify the underlying task in each one. For every conversation, researchers estimated how long that task would take a person working unaided, then valued the saved time against wage data. The number was built up from observed behavior — from what people actually did in the tool — not from what they reported feeling about it afterward. That is the method's strength. Because the estimate is assembled from task-level data, it can be broken apart. You do not get one number for 'productivity.' You get a decomposition by domain, by difficulty, by type of cognitive demand. And that decomposition is where the study becomes useful beyond the headline.

The size of the effect, broken out

The headline finding is large on its face. A task that would take about 90 minutes without assistance was completed roughly 80% faster with AI. In absolute terms, that is the difference between a standard working block and a ten-minute pass. The savings were not distributed evenly, and the distribution matters. Time savings concentrated at the top of the skill range: approximately 90% on healthcare-related tasks, 87% on document writing, and 80% on financial analysis. Work that commands the highest wages — work that requires judgment, synthesis, and domain fluency — is the work AI compresses hardest. The more expensive the task is in human hours, the larger the reduction.

These are not marginal efficiencies that appear in aggregate data and disappear when you look for them in individual workflows. On the specific categories of skilled output that drive most of a knowledge organization's cost, they represent a structural shift in the time required to produce a unit of that output.

The counterintuitive result: difficulty and speedup move together

The natural assumption is that AI earns its gains on routine, repetitive work — data formatting, template generation, the kind of task a skilled person could do at speed but would rather not. The Anthropic data inverts that assumption. Compression scaled with task difficulty. Tasks whose instructions implied a high-school level of education sped up about 9x; tasks implying a college degree sped up about 12x. The more judgment a task demands, the more time AI saves on it.

This finding has a specific implication for where to look for the gain. The complex analytical work that organizations have long treated as irreducible — too bespoke to accelerate, too judgment-intensive to delegate — turns out to be the category that accelerates most. The value does not sit at the bottom of the skill range. It sits at the top.

The macro estimate, stated as a ceiling

Scaled across the full economy under broad adoption over a decade, the study's projection is that US labor-productivity growth could rise by about 1.8 percentage points a year — roughly double the rates recorded over recent decades. State that number as what it is: a ceiling. It assumes near-universal adoption, rests on model-estimated time savings, and does not fully account for how much of the freed capacity converts to additional output versus higher expectations — a tension the companion 81,000-person interview study documents directly. The durable result is at the task level. The economy-wide figure is the optimistic extrapolation of it, valid as an upper bound on the structural opportunity, not as a forecast of what happens without deliberate adoption.

The macro case and the marketing case share the same shape. The value accumulates not from one task done faster once, but from the same judgment applied at a speed and scale that changes which decisions get made before they are committed — and which get made in the dark.

Why conversion work is a prime compression target

The Anthropic findings identify the compression zone precisely: complex, judgment-heavy tasks that require reasoning about a situation, not executing a template. Reasoning through how a skeptical buyer reads a page — what they weigh against each other, where they hesitate, what claim they read as unsupported, which objection they carry into the page from prior research — is exactly that kind of task. It is not mechanical. It cannot be reduced to a checklist. It demands a model of a specific buyer's prior beliefs, decision criteria, and tolerance for risk. Until recently, doing it rigorously meant either running live traffic or accepting that it would not happen at all.

A buyer-simulation run returns findings like: 'price-sensitive personas cannot justify the spend because the page never frames cost per use, only sticker price — the comparison they need is absent.' That is a judgment-intensive output. It requires holding the buyer's goal in mind, tracking what the page does and does not provide, and identifying the gap between them. Per the Anthropic data, it is the category of task that compresses the most. What the AI-shaped buyer brings to that page — a risk-aware, claim-scrutinizing disposition — is the thing the simulation models before the traffic arrives to demonstrate it.

Simulation tells you what to fix and for whom. The test tells you how much it was worth.

A/B testing as a post-spend instrument

A/B testing is empirical but slow and post-spend: it needs a live page, weeks of traffic to reach significance, and a budget already committed to the test. Simulation moves the same diagnostic upstream and returns a ranked list of friction points the same day. The gain is not only speed. It is the placement of the insight in the decision sequence — before the media budget is authorized, not after the campaign has run.

That timing difference is the productivity compression applied to conversion work. The same shift that the Anthropic data documents at the task level — judgment-heavy analysis that previously required a multi-week process now returns the same day — appears in the conversion cycle as the difference between a finding that shapes the asset and a finding that explains why the asset underperformed. Both findings exist. Only one of them changes the outcome.

Decision latency, not just labor saved

Counting hours saved understates the effect. The more valuable return is compressing the loop between a question and a decision-grade answer — from a multi-week test cycle to a same-day read — so that more decisions arrive informed rather than committed and then corrected. For a marketing team, the inflection point is not 'did we work faster on this campaign?' It is 'did we know what to fix before we paid to find out?'

eLLMo is built to occupy the earliest useful point in that loop. Submit a page or a creative concept; receive ranked friction findings traceable to specific buyer segments, with the reasoning the buyer would apply to each; brief the fix before launch. The expensive version of this learning is a campaign that underperforms and a post-mortem that cannot reconstruct why. The simulation validity research establishes that the buyer reasoning the model surfaces is grounded in real behavioral signal, not synthetic rationalization. The productivity research establishes that surface-level diagnostics are exactly the work that should never take weeks to complete.

The forward implication

The Anthropic study measured what is already happening — productivity gains from current models at current adoption rates, observable in real usage today. The 1.8-point macro ceiling is the extrapolation of that to full deployment over a decade. The more immediate implication is narrower and immediate: the most expensive knowledge work compresses the most, and the organizations that route their hardest analytical questions through AI-assisted workflows are already operating at a different effective cost structure than those that do not.

In conversion work, the hardest question is not 'what does the data say happened?' The data eventually answers that. The hardest question is 'what will a specific type of buyer do with this page before any data exists?' That question was previously unanswerable before launch. The productivity compression makes it tractable before every campaign, not only the ones large enough to justify a test. The practical result is that the ceiling on informed pre-launch decisions rises — and the decisions that used to require a live campaign to inform now require a single run.

Methodology note

Built on the research. Designed for decisions.

eLLMo simulation surfaces ranked friction patterns across calibrated buyer personas — specific findings, traceable to buyer segments, actionable on the same day. The methodology is grounded in peer-reviewed research on AI agent behavior and OCEAN psychometrics. The output is a prioritized list of what to fix before your campaign launches — and why it matters for each buyer type.

← PreviousAgent-to-Agent Commerce and Model-Dependent Outcomes Next →The AI-Shaped Buyer: Evidence from 81,000 Interviews