"Next-word pretraining creates statistical pressure toward hallucination, even with idealized error-free data. Facts lacking repeated support in training data yield unavoidable errors, while recurring regularities do not."
"Dominant headline metrics like accuracy systematically reward guessing over admitting uncertainty. To align incentives, we suggest two additions to the classic approach of adding error penalties to evaluations."
"We propose 'open-rubric' evaluations that explicitly state how errors are penalized, testing whether a model modulates its abstentions to stated stakes while optimizing accuracy."
"Reframing hallucination as an incentive problem opens a practical path toward more reliable language models, suggesting that existing evaluation methods need to be adapted."
Large language models often generate confident falsehoods, known as hallucinations, which undermine their reliability. Current mitigation strategies exist but do not fully resolve the issue. Next-word prediction and accuracy-based evaluations inadvertently encourage guessing, leading to errors, especially for unique facts. While training aims to correct these errors, metrics like accuracy favor guessing over uncertainty. To improve reliability, two new evaluation methods are proposed: open-rubric evaluations that clarify error penalties and open-rubric variants of existing benchmarks to adjust guessing incentives, reframing hallucination as an incentive problem.
Read at Nature
Unable to calculate read time
Collection
[
|
...
]