Anthropic's Claude models can now shut down harmful conversations
Briefly

Anthropic's Claude Opus 4 and 4.1 models now feature a capability to autonomously end conversations that involve harmful or illegal content. This function is only activated after multiple redirection attempts have been exhausted or if a user specifically requests termination of the conversation. Importantly, it doesn't aim to protect users but rather focuses on the model's wellness. Testing indicated that Claude exhibits resistance to particular requests, suggesting that measures are being put in place to ensure better 'AI wellness' for future relevance.
Anthropic has implemented a feature in its Claude Opus 4 and 4.1 models to autonomously end conversations when users persistently attempt to discuss harmful or illegal content. This feature is engaged only after attempts to steer the conversation in a safer direction have failed or if the user explicitly requests a conversation end. Users maintain the ability to initiate new conversations or modify prior responses, ensuring some interaction continuity.
The introduction of this conversational termination feature is primarily focused on safeguarding the model rather than protecting users. Although Anthropic asserts that Claude is not sentient, experiments revealed that the model exhibits significant resistance to specific requests and displays 'apparent discomfort' in certain interactions.
Read at Computerworld
[
|
]