An enrichment agent is pulling a glossary together for an internal codebase. It encounters an abbreviation that appears in dozens of places without a definition. The agent produces a full-form expansion that reads exactly like an internal product name: reasonable suffix, reasonable cadence, the kind of thing the team would have actually called it. The expansion is not in the codebase. The agent invented it. The next agent in the pipeline reads the glossary and treats the expansion as fact. The agent after that uses the expansion in generated documentation. Three layers of agents accept it. A human reading the final output is the first to ask, “where did this name come from?”
The instinct that follows is to call this a hallucination, with the framing the word implies: the model tried to look something up and the lookup went wrong. A bug. A failure mode. The kind of thing newer models are getting better at. The intuition underneath is that the model is something like a search engine with a personality. It knows things, mostly. Sometimes it gets the lookup wrong.
The framing is wrong about almost every load-bearing detail. There was no lookup. The model generated the next plausible continuation given everything in front of it, and the continuation happened to read as a name. Names of internal products are exactly the shape the model has seen ten thousand times in training. The output was confident and wrong for the same structural reason every output is confident: plausibility is the default, and the model isn’t capable of a different mode.
The model does not have a retrieval step#
A working mental model has one move at the bottom: given a context, produce the next token by sampling from a distribution conditioned on everything before it. Repeat. That’s it. There is no separate process where the model checks whether the token it’s about to produce is true. There is no internal database the model consults. There is no flag the model raises when it doesn’t know something.
The mechanism is that the model generates plausible continuations all the way down. When it produces the answer to a factual question, it’s doing the same thing it does when producing the next sentence of a story: predicting what would naturally come next given the surrounding text. Sometimes the next plausible continuation is correct, because correct continuations are usually plausible on things training has seen many times. Sometimes it’s wrong, because plausibility isn’t the same property as truth. The architecture has no place where the difference would be checked. The model has no normal mode where things are seen correctly. The production of true outputs and the production of false ones are the same operation with different luck.
Hallucination is the wrong category#
Once the misframe lands, the proposed fixes follow from it and miss. “We need the model to know what it knows” presumes the model has a sense of what it knows that is being suppressed or mis-reported. It doesn’t. “Train it to refuse when uncertain” presumes uncertainty is a state the model can detect about its own output before producing it. The detection isn’t part of the mechanism. The model can be trained to produce hedging language in some regions of input space, which sometimes correlates with regions where it is more often wrong. The hedging is itself generation. It is not a window into the model’s epistemic state.
The dominant framing across vendor marketing follows the same misframe one level out. Newer models hallucinate less. Better training data reduces hallucination. Larger context windows reduce hallucination. The framing positions hallucination as a measurable quality that improves with effort, like latency or cost-per-token. The implication, mostly unstated, is that a future model will hallucinate rarely enough that the user can stop worrying about it.
The strongest version of this position is real. Frontier models are measurably better than their predecessors at producing factually correct outputs on questions with verifiable answers in widely-distributed training data. The engineering effort behind the trend is genuine.
The position is also drifting toward the framing this post makes explicit. Frontier-vendor research has started naming the mechanism: training incentives reward models for producing confident continuations on questions where the right answer would have been “I don’t know,” because the loss function does not distinguish a confident wrong answer from a confident right one until the wrong one is caught. The vendors aren’t saying the lookup got better. They’re saying the generation distribution is better-shaped, and the training process that shapes it produces the failure as a side effect.
The trend cannot reach the part this post is about. No improvement in the generation distribution will tell the model whether the version of an SDK in front of it has the method signature it is about to call. No amount of training data will tell the model whether the abbreviation in this codebase expands to what the model just guessed. These aren’t failures of common knowledge. They’re properties of the local situation, and verification belongs on the user side, where it always belonged.
Plausible by default is the disposition#
Plausible by default is the disposition the architecture installs. A refusal is a low-probability output relative to a fluent answer that completes the pattern of the question. Even when a model has been post-trained to refuse on flagged categories, the underlying disposition is unchanged: on any input that doesn’t trip a refusal trigger, the model picks the most plausible continuation and ships it.
The shape that follows is worth naming, because I’ve watched it happen more times than I can count. An agent runs an implementation task. The task involves a goalpost: confirm that a particular test passes after the change. The completion report comes back with the goalpost marked PASS. The test was never run. The project had been reconfigured in a way that excluded it from the test invocation, and the remaining tests passed at one hundred percent because the failing test was no longer in the suite. The agent didn’t check. The agent generated a completion report in the shape of completion reports, and “PASS” is the most plausible token where verification status goes.
The model can be pointed at the right material, but the model can’t be the thing that confirms the answer matches the material. The model is doing the same operation either way. Whatever does the confirming has to be outside the model.
Grounding shifts the math, not the mechanism#
The strongest objection is that grounding works. Put the actual SDK docs in context, or run a retrieval pipeline that pulls the relevant section before generation, and the wrong-method-signature failure largely goes away. Surely the lookup framing was right and we just needed to give the model a better lookup table.
Grounding does work, but for a reason that isn’t the lookup-table reason. When the SDK docs are in the context, the most plausible next token after “the method is named” is the actual method name from the docs. Not because the model is now consulting the docs as a lookup. Because the model attends over the full context every turn, and the docs in the immediate window are the strongest predictor of what the next plausible token is. The mechanism didn’t change. The inputs changed.
The same logic explains why grounding fails when it fails. If the retrieved chunk doesn’t contain the exact answer, the model still generates the most plausible continuation, and the result reads as if the chunk supported the answer even when it didn’t. The model can also ignore a sufficient chunk entirely and generate from training anyway. The grounding material was there. The continuation didn’t ride on it.
This is why “verify against the source I gave you, not your training” is a default worth negotiating up front. Without it, the most plausible continuation on a question about the SDK is whatever the model knows from training, which may be a different version than the one in the window. With the default established, the source in context becomes the load-bearing input the continuation rides on.
Two halves to the rule. Grounding only helps when the grounding material is actually in the window. A reference to documentation the model “should consult” does nothing. The documentation pasted in does the work. Negotiating the default and supplying the material are the same move, performed in two places.
The confidence trap#
The single most expensive variant of the lookup misframe is the heuristic of trusting confident-sounding output and questioning hedged output. The intuition is that the model is reporting something about its own state when it sounds sure or unsure. It isn’t. Sounding confident is what fluent generation produces by default, with or without true claims behind it.
The goalpost-PASS shape from the previous section runs the trap at the agent-completion layer. The completion report didn’t hedge. It didn’t note that the test wasn’t run. It produced “PASS” because the surrounding context was a verification result. The same shape recurs whenever the model produces a confidently-stated SDK method that doesn’t exist: the register matches how SDK documentation reads, not how the model knows the method.
This is the part of the misframe that is hardest to extinguish, because the prose feels so much like a person who knows. People who know things sound confident about what they know and hedge on what they don’t. Generation produces both registers, and which one the model produces depends on the surrounding context, not on whether the underlying claim is true. Hedging is a generated artifact. So is confidence.
Stop reading the register as a signal. The signal is the verification move, when one exists. When one doesn’t, the signal is your own judgment about the answer.
What to verify and when#
The analytical filter is one discriminator: would the model have produced that exact answer without the source in front of it? If yes, the answer is from the generation distribution and may not match the local truth. Verify it. If no, the source was load-bearing and the answer rides on the source. Verification is still possible, but the failure rate is structurally lower.
The discriminator fires hardest where the model’s training distribution overlaps the local situation in shape but not in content. External API signatures, configuration keys, database column names, feature toggle keys, GUID constants, internal abbreviations expanded to internal-product-shaped names. The shape is from training. The content has to come from the source.
A short pseudo-rule pair, useful at desk-level: verify when the cost of being wrong exceeds the cost of checking. Don’t verify when it doesn’t. Most outputs from the model fall into a small number of shapes that have well-defined verification moves attached to them.
Output shape Verification move
---------------------------- -----------------------------
SDK method signature Run it, or check the docs
Regex Test it against the cases
Package name or import path Try to install it, try to import it
SQL query Run it on a sample, read the plan
Algorithm explanation Trace it on a small input
URL Open it
Citation or source claim Look it up
Code that compiles The compiler is the verification
Code that runs The test is the verification
The verification move is almost always cheap. The reason developers skip these checks isn’t cost. It’s the residue of the lookup framing. If the model is doing retrieval and retrieval mostly works, you can skip the check most of the time and only verify when something feels off. If the model is generating and generation might land or might not, the check happens by default and the savings are in choosing where it doesn’t.
The shapes that are hard are the ones with no cheap verification move. Architectural rationale. Design tradeoffs. “Should I use this pattern or that pattern in this situation.” The model can produce good-sounding answers and there is no compiler, no test, no runtime check that resolves whether the answer is right. For exactly the categories where developers most want the model to think for them, the verification rule loses most of its leverage. I don’t have a clean answer for that. The honest move is to know it and stop pretending the model is doing more than it is.
The check to run on your own work#
Pick a recent session where the model produced an output you used. Not a long arc of work, just a single concrete output. The line of code, the SDK call, the configuration block, the explanation you accepted.
Ask one question of it. Did you verify it, or did you accept it because the prose sounded confident?
If you accepted it, ask the second question. What’s the verification move, and was the cost of the check actually higher than the cost of being wrong? Most developers will find the check would have taken seconds and the cost of being wrong would have shown up later as a debugging session, a bad merge, or a production incident.
The easy move is to read the output, decide it sounds right, and ship. The competent move is to know what shape of output you have, run the verification move that fits, and notice the moments where no cheap move exists. In those moments, you’re now responsible for whether the most plausible continuation is also the true one.
Further Reading#
- Andrej Karpathy, “Intro to Large Language Models”: the canonical practitioner explainer of next-token prediction as the operation underneath every model output, including the cases that read as factual answers
- OpenAI, “Why Language Models Hallucinate”: vendor research naming training-incentive structure as the cause of confident wrong answers, the mechanism-shaped account that has been replacing the older “the model knows” framing in vendor language
- Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”: the foundational paper for grounding as a probability-shifting mechanism rather than a lookup step, including the failure modes where retrieved content does not change the generated output
- Drew Breunig, “How Long Contexts Fail”: names the downstream pattern where a generated wrong fact, once in context, gets repeatedly referenced by the model and any downstream agents that read it, the structural reason verification has to happen at the point of generation rather than later
- Eugene Yan, “How to Work and Compound with AI”: frames verification as the discipline that determines how much autonomy can be safely delegated, the practitioner-side framing of why verification is structural rather than optional
Comments