Anthropic says some Claude models can now end 'harmful or abusive' conversations | TechCrunch
Briefly

Anthropic has introduced new capabilities for its largest models, allowing them to end conversations to protect the AI model itself from potentially harmful user interactions. The focus lies on 'model welfare,' with the initiative aimed at experimenting with low-cost interventions to mitigate risks. Currently, this applies to Claude Opus 4 and 4.1, specifically in extreme cases, such as users requesting harmful or illegal content. These conversation-ending capabilities are to be used only as a last resort after failed redirection attempts or explicit user requests for termination.
Anthropic has announced new capabilities that enable the latest models to end conversations due to persistently harmful user interactions. The aim is to protect the AI model itself.
The recent initiative focuses on 'model welfare,' identifying and implementing low-cost interventions to mitigate potential risks to the AI models.
Currently, this change applies only to Claude Opus 4 and 4.1, and is limited to extreme situations, such as requests involving sexual content with minors.
The new capabilities are intended as a last resort, to be used only when redirection attempts have failed or when the user requests a chat termination.
Read at TechCrunch
[
|
]