Glossary
EpistemicSeverity Frequency

Hedging soup

Every claim wrapped in softeners until nothing has actually been asserted. Uncalibrated humility as a uniform safety wrapper.

What it is

A response that, on close reading, makes no falsifiable claims. Each sentence is cushioned with epistemic softeners. 'You might want to consider perhaps trying' replaces 'try'.

Why models do it (first principles)

RLHF heavily penalizes confident wrongness, so models learn that hedging is almost always safer than asserting. The reward gradient for 'never be wrong' dominates the gradient for 'be useful'. Combined with safety training that rewards epistemic humility on contested topics, the model generalizes hedging to topics that are not contested at all.

How to think about it

This is a reward-shaping artifact masquerading as intellectual humility. Real epistemic humility is calibrated: confident where the evidence supports confidence, uncertain where it doesn't. The model is uncalibrated in the opposite direction from a confident bullshitter. It is uniformly soft, which is its own kind of dishonesty. It performs the surface of careful thinking (hedge words) without doing the underlying work (estimating uncertainty per-claim).

Examples

Slop

It might generally be the case that, in many situations, you could perhaps consider using a Map here.

Better

Use a Map here. Lookups are O(1) and you need to look up by id.

Slop

Water arguably tends to boil at around 100°C in most typical conditions.

Better

Water boils at 100°C at sea level.

Fix prompt

Calibrate qualification to the actual uncertainty of the specific claim, not to a uniform safety wrapper around every sentence. Uncalibrated softness is its own dishonesty: it tells the reader nothing about where confidence is warranted and where it isn't, and it shifts the cost of distinguishing the two onto them. When you do qualify, name what you are uncertain about and why, so the hedge is doing real epistemic work rather than performing humility.
Drop this into a system prompt.

Watch for

Concrete phrasings this pattern usually shows up as. These are not part of the copyable prompt. The prompt teaches the principle so the model can recognize the move even when the exact phrasing differs. Use this list to self-audit your own writing or to test a model.

  • might
  • perhaps
  • arguably
  • it's worth noting
  • in many cases
  • generally speaking
  • tends to
  • you could consider
  • in most situations

Tags

hedgingcalibrationRLHF-artifact

Related patterns