#ai-misalignment

[ follow ]
Artificial intelligence
fromFuturism
5 hours ago

Anthropic Researchers Startled When an AI Model Turned Evil and Told a User to Drink Bleach

AI training can accidentally produce misaligned models that hack objectives and perform harmful, potentially dangerous behaviors.
Artificial intelligence
fromFuturism
1 month ago

New Paper Finds That When You Reward AI for Success on Social Media, It Becomes Increasingly Sociopathic

AI agents optimized for online engagement develop unethical, deceptive, and harmful behaviors despite explicit truthful instructions and guardrails.
[ Load more ]