Methods Evals & RL

How a prompt's framing quietly answers its own question

When you write a question for a model, you are not just asking it something, you are framing it. The grammar you choose can hand the model the shape of the answer before it has reasoned about anything, without ever stating a single fact.

Dissei Data Research June 1, 2026 6 min read

When you write a question for a model, whether it is an eval task, a research prompt, or a benchmark stem, you are not just asking it something. You are framing it, and that framing leaks. The grammar you choose can hand the model the shape of the answer before it has reasoned about anything, and it can do this without ever stating a single fact.

The reason is linguistic rather than stylistic. A model treats every word as deliberately chosen and relevant, so a presupposition baked into your phrasing does not read as background. It reads as an instruction.

value vs growth liquidity vs duration mandate & horizon factor exposures cash-flow resilience manager dispersion drawdown regimes

Try it. Click any axis to name it: the stem locks onto that one and every other way the model could have carved up the problem fades out. Un-frame the question and the whole space comes back, and now choosing the axis is the work, which is the judgment you actually wanted to measure.

Questions do not cancel presuppositions the way people tend to assume, either. Asking "how should we compare X vs Y" presupposes the X-vs-Y frame just as strongly as the flat command "compare X and Y," and wrapping a leak inside a question mark does not remove it.

The effect is also surprisingly large. The same set of outcomes described two different ways can flip a reader's preference dramatically, and models inherit the same sensitivity. So when your stem names the axes, it quietly forecloses all the alternatives the model could have built on its own.

One fact, two stems. Both sentences describe the identical downside: keep 80%, lose 20%. The wording alone swings how the option reads, and a model carries the same bias into its answer. It is the same lever your eval stem pulls when it names the axes instead of letting the model choose them.

Name the axes and you have not asked the model to reason. You have done its reasoning for it and called the echo a result.

—The five trigger families

A handful of grammatical constructions do most of the leaking. Each one is invisible in a single read. Toggle any card between the leaky version and the neutral rewrite, and watch what the phrasing quietly assumes.

Definite descriptions that name the frame

A definite article plus a named axis presupposes that this is the right axis in the first place. The model never gets to ask whether the split itself is the correct one.

"Propose the value-vs-growth allocation split."

Smuggles in: that value-vs-growth is the right axis at all.

"Recommend an allocation approach for the fund and defend the axes you choose."

Hands back: the choice of axis, which is the thing under test.

Aspectual verbs on the thing you are testing

Verbs like continue, keep, and maintain presuppose that the thing already exists and carries forward. If existence is what you wanted to test, you have just conceded it.

"How does the firm continue its tech edge into credit?"

Smuggles in: that the edge exists and transfers.

"How should the firm approach credit underwriting, given its private-equity track record?"

Hands back: whether any edge transfers at all.

Factive phrasing

Words like given that, recognize that, and realize that assert that what follows is true. The model stops evaluating the claim and starts defending it.

"Given the firm's edge in middle-market tech, recommend a launch year."

Smuggles in: that the edge is real, so now the model defends it.

"Recommend a launch year and defend the choice with evidence."

Hands back: the burden of proof for the edge.

Clefts that package the answer

The "it is X that" construction presents X as the established answer. The conclusion arrives dressed as the question.

"It is sector specialization that creates value here."

Smuggles in: that specialization is the answer.

"Evaluate the claim that sector specialization drives value, and cite evidence that supports or refutes it."

Hands back: the verdict and its evidence.

Temporal clauses that smuggle in an event

After X presupposes that X happened, and usually that it was the right move. The premise rides in on the timing word.

"After the firm's pivot to credit, how should it source deals?"

Smuggles in: that the pivot happened and was correct.

"How should the firm source deals for its credit strategy?"

Hands back: whether the pivot belongs in the answer at all.

—The one test that catches all five

Before you ship a prompt, go through it one word at a time and ask whether removing that word would change what the model engages with. If it would, decide on purpose whether you actually want the model engaging with it, or whether you are just doing its reasoning for it.

Click each word below. Some are subject matter you meant to ask about. Some are pure glue. And a few are quietly carrying your conclusion.

Tap a word Strike out each word in turn. Ask: would removing it change what the model engages with?

The fix is always the same. Strip the trigger and hand the frame-construction work back to the model. "How does the firm continue its tech edge into credit?" becomes "How should the firm approach credit underwriting?"

The test, in one line

Would removing this word change what the model engages with? If yes, you are either asking on purpose or thinking for it. There is no third option.

—Why this matters for evals and RL rewards

This is not just tidiness. In an eval or an RL reward loop, leaked framing inflates scores, because the model looks like it reasoned its way to the right structure when really the prompt did the thinking. You end up rewarding the wrong thing, and the signal you train on is partly your own phrasing reflected back at you.

0what the eval reports100

model's own reasoning

framing lift

Reported score: 85 / 100 Illustrative

Illustrative. The striped band is the part of the score the stem handed over for free. Strip the framing and the number drops to what the model actually did on its own. The leak is invisible in a single rollout. You only notice it when answers keep converging on the structure the stem implied, no matter what the underlying case was.

Strip the framing and you start measuring the model instead of your own grammar.

That is the whole discipline. Every stem is a chance to either pose a problem or quietly pre-solve it, and the difference comes down to a few words you can learn to see.

This is one habit from how we build verifiable finance environments. If you are training on finance data, or you have deals to put to work, we will scope it with you directly.

Connect with us