
Anthropic released Claude Opus 4.8, a large language model positioned as a better match for a search for honesty. The model is reported to be less likely to make unsupported claims and more likely to indicate when it is uncertain. Evaluations claim it is about four times less likely than its predecessor to allow flaws in code to pass without being noticed. Prior versions showed noticeable gains in instruction interpretation and problem-solving tactics, with Opus 4.7 more frequently recognizing when an initial approach failed. The release also includes agent capabilities, including dynamic workflows that can run many Claude subagents, and a fast mode that is described as cheaper while regular Opus pricing remains unchanged.
"“One of the most prominent improvements in Opus 4.8 is its honesty,” the company said Thursday in a blog post. Also: Your Claude agents can 'dream' now - how Anthropic's new feature works Now, perhaps, this new frontier model will behave itself better. Anthropic reports that Opus 4.8 is less likely to make unsupported claims. It's also more likely to tell you when it's uncertain of an answer."
"“This is borne out in our evaluations, which show that Opus 4.8 is around 4x less likely than its predecessor to allow flaws in code it's written to pass unremarked,” the company said. In Claude Code, I found Opus 4.7 to be a substantial improvement over 4.6. While 4.6 would often misinterpret instructions or deliver erroneous results, Opus 4.7 regularly tells me that the way it first looked at a problem didn't work, and it's taking a different tactic."
"So, given the jump in quality from 4.6 to 4.7, which was subjectively quite noticeable over many sessions, I'm hoping we'll see the same in the jump from 4.7 to 4.8. Also: The 5 myths of the agentic coding apocalypse It would seem this is the case, at least according to Tom Pritchard, staff e"
"Claude Opus 4.8 promises more honest AI answers. Dynamic workflows can run hundreds of Claude subagents. Fast mode gets cheaper, while regular Opus pricing stays put."
Read at ZDNET
Unable to calculate read time
Collection
[
|
...
]