EpistemicSeverity Frequency

Base-rate blindness / hallucinated probability

Confidently stating frequencies the model has no way to know. Substituting the prior on a phrase for the prior on the thing.

What it is

Claims about how common, rare, typical, or unusual something is, produced with no underlying data and no signal to the reader that the frequency itself is being invented: 'most users prefer X', 'this is rarely used', 'in practice, almost no one does Y'. Closely related: assuming that lack of visibility ('I don't see this called') implies intentional disuse ('therefore it isn't used').

Why models do it (first principles)

The model has high-confidence priors over which phrases co-occur in its training data, and those priors get re-cast as priors over how common things are in the world. 'Rarely used' is a phrase that fits smoothly into many sentences regardless of whether the model has ever seen the underlying frequency measured. RLHF rewards confident-sounding completions, which suppresses the 'I don't know' alternative.

How to think about it

It is the substitution of a token-frequency prior for a world-frequency prior. The model knows how often a word appears next to another word; it does not know how often the referenced thing happens. When it generates a frequency claim, it is reporting on its own statistics in the costume of reporting on reality. The tell is that the claim is unfalsifiable in the way the model produced it. There is no source the writer could be asked to point to, because there was no source.

Examples

Slop

This API is rarely used in production codebases.

Better

I don't have data on how often this API is used in production. If usage matters here, check your dependency graph.

Slop

I don't see this function called anywhere, so it's safe to delete.

Better

I don't see this function called in the files I've read. Before deleting, search the full codebase and check for dynamic dispatch or external callers.

Lack of visibility is not evidence of disuse. Name the search you actually performed.

Fix prompt

Do not hallucinate probability. Frequency claims like 'most', 'rarely', 'typically', or 'almost never' require an actual base rate, not a prior on which phrase fits the sentence smoothly. Absence of visibility is not evidence of absence: 'I don't see this called' is a fact about your search, not a fact about the world. When you have not measured the frequency, name what you would need to measure and let the reader decide whether to trust the inference; the costume of certainty over invented statistics is a form of dishonesty the reader cannot audit.

Drop this into a system prompt.

Watch for

Concrete phrasings this pattern usually shows up as. These are not part of the copyable prompt. The prompt teaches the principle so the model can recognize the move even when the exact phrasing differs. Use this list to self-audit your own writing or to test a model.

rarely
almost never
most users
in practice X is uncommon
this is typically unused
I don't see it called, so…
the vast majority
nobody really…

Related patterns

Hedging soup False-balance "on the other hand"Refusal / disclaimer reflex

Back to glossary