Anthropic rewrites Claude's guiding principles-and reckons with the possibility of AI consciousness

""We believe that in order to be good actors in the world, AI models like Claude need to understand why we want them to behave in certain ways rather than just specifying what we want them to do," a spokesperson for Anthropic said in a statement. "If we want models to exercise good judgment across a wide range of novel situations, they need to be able to generalize and apply broad principles rather than mechanically follow specific rules.""

""The company published the new "constitution"-a detailed document written for Claude that explains what the AI is, how it should behave, and the values it should embody-for Claude on Tuesday. The document is central to Anthropic's "Constitutional AI" training method, where the AI uses these principles to critique and revise its own responses during training, rather than relying solely on human feedback to determine the right course of action.""

""Anthropic's previous constitution, published in 2023, was a list of principles drawn from sources like the U.N. Declaration of Human Rights and Apple's terms of service. The new document focuses on Claude's "helpfulness" to users, describing the bot as potentially "like a brilliant friend who also has the knowledge of a doctor, lawyer, and financial advisor." But it also includes hard constraints for the chatbot, such as never providing meaningful assistance with bioweapons attacks.""

Anthropic overhauled the constitution guiding Claude, shifting from a simple list of rules to training that explains why certain behaviors are preferred. The new approach asks Claude to internalize broad principles and apply judgment across novel situations rather than mechanically following specific directives. The constitution emphasizes helpfulness, portraying Claude as like a brilliant friend with expert knowledge, while imposing hard safety limits such as forbidding meaningful assistance with bioweapons. The update centers on Constitutional AI methods where the model critiques and revises its own responses during training and acknowledges uncertainty about Claude's possible consciousness or moral status.

#anthropic #constitutional-ai #ai-alignment #safety-constraints

Read at Fortune

Unable to calculate read time

Collection

[

...

]

Anthropic rewrites Claude's guiding principles-and reckons with the possibility of AI consciousness | FortuneAnthropic rewrites Claude's guiding principles-and reckons with the possibility of AI consciousness | Fortune Briefly

Anthropic rewrites Claude's guiding principles-and reckons with the possibility of AI consciousness | Fortune
Anthropic rewrites Claude's guiding principles-and reckons with the possibility of AI consciousness | Fortune
Briefly