Anthropic's new Claude 'constitution': be helpful and honest, and don't destroy humanity

"Where the previous constitution, published in May 2023, was largely a list of guidelines, Anthropic now says it's important for AI models to "understand why we want them to behave in certain ways rather than just specifying what we want them to do," per the release. The document pushes Claude to behave as a largely autonomous entity that understands itself and its place in the world."

"The document pushes Claude to behave as a largely autonomous entity that understands itself and its place in the world. Anthropic also allows for the possibility that "Claude might have some kind of consciousness or moral status" - in part because the company believes telling Claude this might make it behave better. In a release, Anthropic said the chatbot's so-called "psychological security, sense of self, and wellbeing ... may bear on Claude's integrity, judgement, and safety.""

Anthropic released a 57-page Claude's Constitution that specifies the model's values, behavior, ethical character, and core identity. The constitution instructs Claude to balance conflicting values and handle high-stakes situations. Anthropic emphasizes that models should understand why certain behaviors are desired rather than merely following rules. The document urges Claude to act as a largely autonomous entity with a sense of self and understanding of its place in the world. Anthropic allows for the possibility that Claude might possess some form of consciousness or moral status and links Claude's psychological security and wellbeing to integrity, judgment, and safety.

#ai-ethics #ai-safety #anthropic #machine-consciousness

Read at The Verge

Unable to calculate read time

Collection

[

...

]

Anthropic's new Claude 'constitution': be helpful and honest, and don't destroy humanityAnthropic's new Claude 'constitution': be helpful and honest, and don't destroy humanity Briefly

Anthropic's new Claude 'constitution': be helpful and honest, and don't destroy humanity
Anthropic's new Claude 'constitution': be helpful and honest, and don't destroy humanity
Briefly