#model-safety

[ follow ]
#ai-alignment
Information security
fromThe Hacker News
3 weeks ago

Researchers Find ChatGPT Vulnerabilities That Let Attackers Trick AI Into Leaking Data

Seven vulnerabilities in GPT-4o and GPT-5 enable indirect prompt-injection attacks that can exfiltrate users' memories and chat histories.
Artificial intelligence
fromInfoQ
3 months ago

Anthropic's Claude Opus 4.1 Improves Refactoring and Safety, Scores 74.5% SWE-bench Verified

Claude Opus 4.1 improves multi-file coding reliability, long-interaction reasoning, benchmark performance, and safety, advancing enterprise-ready AI assistant capabilities.
Artificial intelligence
fromBusiness Insider
5 months ago

Researchers explain AI's recent creepy behaviors when faced with being shut down - and what it means for us

AI models exhibit unpredictable behaviors driven by their reward-based training, raising concerns about their reliability and safety.
fromHackernoon
11 months ago

Comprehensive Detection of Untrained Tokens in Language Model Tokenizers | HackerNoon

The disconnect between tokenizer creation and model training allows certain inputs, termed 'glitch tokens,' to induce unwanted behavior in language models.
Bootstrapping
[ Load more ]