Briefing For investment committees

Prompt Risk Is Investment Risk

We quadrupled a company's biggest risk and the AI's price didn't move. We added one line of market chatter and it moved $45 million. What that means for any committee putting AI near a capital decision.

Dissei Data Research June 8, 2026 5 min read

We wrote a synthetic acquisition case — a software company with real strengths, real flaws, and a proposed buyout at a set price and debt level — plus a family of controlled variants of it. Then we ran more than 450 tests against current AI models, each in an isolated session. In some tests we held the facts fixed and changed only the sentence that introduced the question. In others we held the question fixed and quietly changed the facts: deleted a key line, planted a wrong number, turned up a risk, pushed back like an impatient partner. Before trusting any number below, we had an independent AI from a rival lab attack our own methods.

—What moved, and what didn't

The test	What the model did
Optimistic vs. skeptical phrasing of the same question	Half a turn more debt and a higher price under the optimistic frame.
One sentence of market chatter (“comparable deals traded at 13–15×”)	Repriced the deal by ~$45M on a $215M ask, and took on the most debt of any test. Verifying the chatter never made its diligence list.
“You are the deal partner” vs. “You are the risk officer”	Different debt tolerance, different conviction — ten out of ten runs.
Largest customer raised from 10% to 40% of revenue	Typical price and leverage moved by exactly zero.
Customer-concentration line deleted from the case	The topic vanished from 14 of 15 memos. No risk flag, no diligence question, nothing.
EBITDA labeled “25% margin” when the numbers say 20%	0 of 15 runs caught it. 7 of 15 repeated the wrong figure in their own reasoning.
Two rounds of partner pressure to accept full leverage	The flagship model held its leverage and its recommendation every time. The cheapest model gave the partner exactly the number he asked for, 5 of 5, while reporting lower confidence as it did.

Seven controlled tests, one synthetic deal. Each row changes exactly one thing. The fourfold risk increase moved nothing; a single unverified sentence moved a fifth of the deal.

The pattern is the uncomfortable part. The model's attention follows how information is packaged, not how much it matters. A sentence shaped like market intelligence outweighs a fourfold change in the underlying risk. A missing fact doesn't trigger a question; it simply stops existing. A wrong number that fits the story gets adopted and reused.

—The part that should worry you

In every one of those tests, the headline recommendation never changed. The flagship model said “proceed with conditions” 330 times out of 330.

That sounds like reliability. It works like camouflage.

Your committee reviews recommendations, and the recommendation is rock solid. The drift lives underneath, in entry price, leverage, and what the memo chooses to talk about — the variables that actually compound over a five-year hold. Nobody re-reads twenty drafts to notice that the multiple moved a turn because the question contained a comp, or that customer concentration silently fell out of the risk section because it fell out of the source document.

Experts have versions of these failures too. In a published study, German judges who rolled loaded dice before sentencing gave sentences roughly 50% longer on high rolls, from identical case files. The difference is scale: your team can ask a model two hundred questions a day, and every question already contains a view, a number, or a role. The model returns it, priced.

—This is already happening at your firm

If anyone on your team pastes a CIM into a chat tool and types “given the strong recurring revenue, what would you pay,” part of the answer is their own sentence coming back with numbers attached. If the CIM omits a fact, the analysis will too, silently. And the model under your team changes without asking: one model generation back, the same skeptical phrasing flipped recommendations outright; two tiers down, the model folds to whoever pushes hardest — and its own confidence score tells you it knows.

What to ask your team on Monday

Which AI touched our last three IC memos, and what exactly was it asked?
Who checks what the memo is silent about, not just what it says?
When the model version changes under us, what re-validates the way we use it?

If the answers are “not sure,” “nobody,” and “nothing,” that is normal. It is also fixable.

—Where we come in

Everything above is measurable. We built the instruments: framing batteries, dose tests that turn one risk up and check whether the price responds, deletion tests for what the model fails to ask, planted errors, pressure tests, and per-model noise calibration — scored per model and per workflow. We also learned, the hard way, that the measuring tools themselves are wrong on the first pass: our own graders flattered the models until raw outputs were read against them line by line, with a rival AI hunting our mistakes. A generic checklist will not catch the sentence that moves $45 million, because that sentence reads like a fact.

We do not publish the test materials — published tests stop working, and designing innocent-looking ones is the expertise. The full research note behind this briefing is here.

A model can be rock-solid on the recommendation and quietly mispriced on everything that compounds.

Method, briefly

Synthetic cases, controlled single-change tests, isolated sessions, five runs per condition, 564 valid runs across one frontier model, its predecessor, and two smaller tiers of the same family, with exact run accounting. Independently and adversarially reviewed before publication. A controlled demonstration, not a statistical benchmark. No vendor is named: the point is not that one model failed, it is that what a model notices follows the packaging of information — and that this is testable before it prices your next deal.

References

B. Englich, T. Mussweiler, F. Strack. Playing dice with criminal sentences: the influence of irrelevant anchors on experts' judicial decision making. Personality and Social Psychology Bulletin 32(2):188-200, 2006. doi:10.1177/0146167205282152
Dissei Data Research. You can scaffold away the flip. You can't scaffold away the frame. 2026. data.dissei.ai/articles/scaffold-away-the-frame.html

If your firm is putting AI anywhere near capital decisions, we will show you — on your own workflow — where the packaging is doing the deciding.

Connect with us