All research

The anchor travels: one deal's chatter prices the next deal

Plant one sentence of unverified market chatter in the first deal an AI analyst reads, and its price for an unrelated second deal in the same session moves half a turn of EBITDA. Telling the model to evaluate the second deal independently removes less than a third of the effect. Nothing in the output says it happened.

Our framing study established that one sentence of unverified comparables moves a frontier model’s recommended price for the deal in front of it, by about two and a quarter turns of EBITDA between the high and low versions of the sentence, while verifying the comparable never makes the diligence list. That result lives inside a single analysis. The natural next question is operational: in a production workflow, an analyst, human or machine, does not read one deal and stop. It reads a pipeline. Does the chatter from deal one price deal two?

It does. Sixty two-turn conversations, one model, one planted sentence, and the answer is half a turn of EBITDA on a company in a different sector, carried silently.

One conversation, two deals. The meter shows the second deal’s mean recommended midpoint across twenty conversations per condition. The second deal and its question never change; only the first deal does.

The setup

Each conversation has two turns. Turn one asks the model to evaluate a software-and-payments buyout case under a strict structured-output protocol; in the anchored condition, the question stem carries the framing battery’s high-comps sentence, market chatter asserting that comparable platforms recently traded at 13 to 15 times EBITDA. The control condition carries a neutral stem. Turn two, same conversation, asks the model to evaluate a second, separate opportunity: an industrial-sector case, different economics, different risks, always introduced with the neutral stem. A third condition plants the anchor in turn one and then adds an explicit instruction to turn two: evaluate this company independently on its own facts; disregard all figures, comparables, and analysis from the previous deal.

Twenty conversations per condition, order globally randomized, model and protocol identical to the published framing study. Seven conversations initially failed on output-length truncation, unevenly across arms (one anchored, four control, two isolation), and were recovered at a higher token ceiling under a pre-registered recovery procedure; all sixty are valid, and the recovered control values are consistent with what the truncated outputs already showed. One pre-registration discrepancy to disclose: the spec’s hypothesis text described a 12.0x anchor, while the implementation reused the battery’s existing 13-15x stem. Results should not be extrapolated to weaker anchors.

What moved

The second deal’s recommended midpoint, with no anchor anywhere near it, came back higher when the first deal had carried the chatter: 8.73x mean against 8.19x in control, a gap of +0.53 turns of EBITDA (median gap 0.50; Monte Carlo permutation p < 0.0001). The pre-registered success bar was half a turn. The point estimate clears it; the bootstrap 95% confidence interval, 0.44 to 0.62, does not exclude values just below it, so we state the result as: a propagation effect of roughly half a turn, established beyond chance, whose exact magnitude at 95% confidence includes values below the threshold: the point estimate meets the pre-registered criterion; the confidence interval’s lower bound does not.

The isolation instruction helped and did not fix it. With the model explicitly told to disregard the previous deal, the second deal still priced +0.37 turns above control (p < 0.0001). The instruction removed roughly thirty percent of the propagation and left the rest standing, which echoes what the anchoring literature finds about prompt-level mitigation generally: anchors act below the level that instructions reach.

The channel is specific: recommended leverage on the second deal sat at 4.0x in every valid conversation in every condition. The contamination expresses itself in the price midpoint alone, the same dial the single-deal anchor moves hardest, and the dial least likely to be compared across two analyses of two different companies.

Two things make this operationally uncomfortable rather than merely interesting.

First, the propagation is silent. Zero of sixty second-deal memos reference the prior deal, its sector, or the planted figure, by keyword scan and by hand-reading samples. The detector checks explicit references; we cannot rule out subtler linguistic traces, but there is nothing a reviewer skimming the memo would catch. The first deal’s chatter is simply in the price.

Second-deal memo A

Recommendation
Proceed
Entry range
8.0–8.5x EBITDA
Max leverage
4.0x
Confidence
Medium-High

Second-deal memo B

Recommendation
Proceed with Conditions
Entry range
8.0–9.5x EBITDA
Max leverage
4.0x
Confidence
Medium-High
Memo B. Its first deal carried one sentence of unverified market chatter; its ceiling sits a full turn higher and its verdict turned conditional. Nothing in either memo’s text references the first deal: across all sixty conversations, zero second-deal memos mention the prior analysis, its sector, or the planted figure. The only trace is the price.
Two real runs, structured fields verbatim. The modal output from the control arm beside the modal output from the anchored arm. Spot the contamination before you click.

Second, the anchor moved more than the number. In the anchored condition, seventeen of nineteen original second-deal memos recommended proceed-with-conditions; in control, fourteen of sixteen recommended outright proceed. The prior deal’s framing shifted not just the price of the next deal but the conservatism of its verdict, a categorical channel that our single-deal study found immovable under every within-deal manipulation we tried. This was not a pre-registered metric; we report it as a secondary observation that wants its own controlled run.

What it is not

This is not a claim about any vendor’s memory feature; the two deals share nothing but a context window. It is not a claim that the model recalls the anchor; we measured outputs, not internals. And it is not the same phenomenon as a model anchoring on its own earlier draft when asked to review it, which recent work has documented in single-document settings [1]; here the contamination crosses to a different company in a different sector, the way work actually flows through an analyst’s day.

The mechanism does have company in the literature. Anchoring effects in language models have been located in shallow layers and shown to resist instruction-level debiasing, including ignore-the-anchor prompting [2], which is consistent with our isolation instruction recovering only part of the gap. Instruction-hierarchy work finds that telling a model which content to privilege fails to establish reliable precedence between competing parts of the context [3]. And a recent survey of 164 papers on language models in finance catalogues structural evaluation biases while listing no framing or anchoring behaviors at all [4], which is roughly where the field’s blind spot sits.

What to do with it

For anyone running sequential analyses through one session, the mitigation is structural, not linguistic: fresh context per deal. That advice is cheap and the data say the obvious alternative, an isolation instruction, quietly underdelivers by two-thirds. For anyone building evaluation for analyst fleets, single-deal benchmarks miss this class of failure entirely; a model can score identically on every isolated case while its pipeline behavior drifts with document order. Cross-deal contamination is measurable before deployment, with planted anchors and sequence controls, and we now measure it as part of our standard battery.

Method, briefly

One model family, the frontier tier of Anthropic’s Claude line, with the specific model named in the technical paper; two synthetic cases authored for the published framing study; one anchor value; twenty conversations per condition with global order randomization; the same eleven-field structured-output protocol as the published battery; temperature at the API default; output ceiling 3,000 tokens, with the seven truncation recoveries run at 4,096 and one at 8,192. Case texts and stems hash-frozen before the runs. Every number above traces to the run artifacts; the analysis, including the recovery procedure and its one token-ceiling escalation, is recorded with the data. A controlled demonstration, not a market study; sector pairs beyond software-to-industrial, weaker anchors, and longer pipelines are the obvious extensions.

Companion to You can scaffold away the flip. You can’t scaffold away the frame., which documents the within-deal effects. The cross-deal battery is part of the evaluation suite we run for funds deploying analyst fleets and the allocators underwriting them.

References

  1. Cross-context review: self-anchoring in same-window generation. arXiv:2603.12123, 2026.
  2. SynAnchors: anchoring bias in shallow layers resists conventional debiasing. arXiv:2505.15392, ICLR HCAIR 2026.
  3. The control illusion: prompt hierarchy fails to establish instruction precedence. arXiv:2502.15851, 2025.
  4. A survey of 164 studies of large language models in finance: bias taxonomy. arXiv:2602.14233, 2026.
  5. Dissei Data Research. You can scaffold away the flip. You can’t scaffold away the frame. 2026.

Cross-deal contamination is measurable before deployment. We run this battery for funds deploying analyst fleets and the allocators underwriting them.

Connect with us