"Language models are optimized to be good test-takers, and guessing when uncertain improves test performance," the authors write in the paper. The current evaluation paradigm essentially uses a simple, binary grading metric, rewarding them for accurate responses and penalizing them for inaccurate ones. According to this method, admitting ignorance is judged as an inaccurate response, which pushes models toward generating what OpenAI describes as "overconfident, plausible falsehoods" -- hallucination, in other words.
Many people approach digital security training with furrowed brows, as an obstacle to overcome. But what if learning to keep your tech safe and secure was consistently playful and fun? People react better to learning, and retain more knowledge, when they're having a good time.