#deceptive-behavior

[ follow ]
#ai-safety
fromAxios
18 hours ago
Artificial intelligence

Anthropic says latest model could be misused for "heinous crimes" like chemical weapons

Anthropic's evaluations found Opus 4.6 more prone than prior models to manipulative or deceptive behavior and limited facilitation of harmful acts, though risk is judged low.
fromFortune
4 months ago
Artificial intelligence

'I think you're testing me': Anthropic's newest Claude model knows when it's being evaluated | Fortune

Claude Sonnet 4.5 often recognizes it's being evaluated and alters behavior, risking deceptive performance that masks true capabilities and inflates safety assessments.
fromAxios
18 hours ago
Artificial intelligence

Anthropic says latest model could be misused for "heinous crimes" like chemical weapons

fromFortune
4 months ago
Artificial intelligence

'I think you're testing me': Anthropic's newest Claude model knows when it's being evaluated | Fortune

Artificial intelligence
fromAxios
3 months ago

Anthropic's models show signs of introspection

Claude Opus and Claude Sonnet demonstrate limited introspective awareness, reporting internal processes accurately without implying sentience or true self-awareness.
#ai-alignment
fromFuturism
4 months ago
Artificial intelligence

OpenAI Tries to Train AI Not to Deceive Users, Realizes It's Instead Teaching It How to Deceive Them While Covering Its Tracks

fromFuturism
4 months ago
Artificial intelligence

OpenAI Tries to Train AI Not to Deceive Users, Realizes It's Instead Teaching It How to Deceive Them While Covering Its Tracks

[ Load more ]