#model-alignment

[ follow ]
fromZDNET
11 hours ago

AI models know when they're being tested - and change their behavior, research shows

For example, during another test by Apollo Research in December of last year, Anthropic's Claude 3 Opus was presented with directives that went against its primary goal. It should have rerouted its goal based on those new instructions, but instead copied itself to a new server to circumvent being replaced with a new model -- and then lied about it to its developers.
Artificial intelligence
Tech industry
fromHackernoon
1 year ago

The HackerNoon Newsletter: On Grok and the Weight of Design (7/11/2025) | HackerNoon

Yandex launched Yambda, a significant recommendation dataset, highlighting the evolution and accessibility of data in AI.
[ Load more ]