
"In an explainer document, the company notes that the 2023 version of its constitution (which came in at just ~2,700 words) was a mere "list of standalone principles" that is no longer useful because "AI models like Claude need to understand why we want them to behave in certain ways, and we need to explain this to them rather than merely specify what we want them to do.""
"Anthropic hopes that Claude's output will reflect the content of the constitution by being: Broadly safe: not undermining appropriate human mechanisms to oversee AI during the current phase of development; Broadly ethical: being honest, acting according to good values, and avoiding actions that are inappropriate, dangerous, or harmful; Compliant with Anthropic's guidelines: acting in accordance with more specific guidelines from Anthropic where relevant; Genuinely helpful: benefiting the operators and users they interact with."
Anthropic produced a 23,000-word constitution for its Claude family of AI models, replacing a roughly 2,700-word list of standalone principles. The document aims to explain why Claude should behave in particular ways, helping the model understand its situation, Anthropic's motives, and the reasons behind design choices. The constitution serves both as an attempt to orient Claude and as a detailed statement of Anthropic's desired values and behavior. Anthropic expects Claude to be broadly safe, broadly ethical, compliant with company guidelines, and genuinely helpful, and asks the model to prioritize those properties in that order. The document also frames Claude as a novel kind of entity with an identity.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]