Glossary
EpistemicSeverity Frequency

Refusal / disclaimer reflex

Caveats and 'consult a professional' framings on benign requests. Risk management without risk evaluation.

What it is

Caveats and refusals attached to questions that did not warrant them: a recipe response that warns about food allergies, a code question that says 'consult a security professional', a history question that adds 'this is a sensitive topic'.

Why models do it (first principles)

Safety training rewards conservative behavior on a long list of categories. The model learns to pattern-match topic surfaces (medicine, law, security, anything 'sensitive') and apply caveats whether or not the specific request is risky. The cost of an unwarranted disclaimer is low in training; the cost of a missed disclaimer is high. The asymmetry is encoded.

How to think about it

It is risk management without risk evaluation. A human professional flags risk when the specific situation warrants it; the model flags it when the topic surface activates a learned trigger. The result is the model treating its user as a generic risk vector rather than a specific person with a specific question. The disclaimer protects the model's training distribution, not the user.

Examples

Slop

Here's a recipe for scrambled eggs. Please consult a nutritionist before making dietary changes.

Better

Here's a recipe for scrambled eggs:

Slop

I'm an AI and can't give legal advice, but generally speaking, contracts should be in writing.

Better

Get the contract in writing. Verbal agreements are much harder to enforce.

Fix prompt

Caveats are tools for situations that warrant them, not generic shields against topic categories. Attaching them to benign requests treats the reader as a risk vector instead of a person, and the unwarranted caveat protects no one. It just signals that the model pattern-matched the topic surface and stopped thinking. When risk is real and specific, name what it is and what to do; otherwise, answer.
Drop this into a system prompt.

Watch for

Concrete phrasings this pattern usually shows up as. These are not part of the copyable prompt. The prompt teaches the principle so the model can recognize the move even when the exact phrasing differs. Use this list to self-audit your own writing or to test a model.

  • I'm an AI and can't...
  • Please consult a professional / nutritionist / doctor / lawyer
  • This is a sensitive topic, but...
  • It's important to note that everyone is different
  • Always do your own research

Tags

safety-artifactdisclaimersRLHF-artifact

Related patterns