AI Model Release Tracker: Opus 4.8's misalignment rates similar to Claude Mythos Preview

AI model releases arrive frequently, and newer versions are not always major step changes. Model strengths depend on context, including where competitor models perform poorly or strongly, which models have standout specialties, and which are merely reaching industry standards. A model release tracker organizes notable releases and provides key elements plus hands-on expert testing when applicable. Expert scores appear for certain models. The tracker is updated when new notable models arrive, and it does not test every model or update. Claude Opus 4.8 replaces Opus 4.7 at the same price, offering faster thinking modes at one-third the cost, with improved coding performance and emphasis on prosocial traits and honesty.

"AI labs are shipping new models nonstop. Besides being better and faster than their predecessors, however, every new model isn't guaranteed to be a major step change, despite how the company's PR may wax poetic about them. Model strengths really emerge in context: Where are competitor models lacking or excelling? Which models have outstanding specialties, and which are just catching up to industry standards?"

"Our Model Release Tracker helps you make sense of where models stand relative to each other, and whether they're worth a deeper look. While we don't test every model or model update on this list, we'll always include the key elements you need to know, along with our hands-on expert test, where applicable. We also include an Expert Score for certain models. Curious about how we test AI? Check out this breakdown of our process."

"Replacing Opus 4.7 starting today (at the same price), Opus 4.8 offers faster thinking modes for one-third the cost of the earlier version, according to Anthropic. Like most of Anthropic's models, 4.8 prioritizes coding abilities, scoring higher than 4.7 on two coding benchmarks but not fully besting OpenAI's GPT 5.5. It also "reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user's best interest," the company noted in the release, though definitions for what that means remain murky."

"Anthropic has always prioritized model safety and interpretability, but appears to be further emphasizing that standard with this release. The company said Opus 4.7 had a 92% honesty rate, in addition to being less sycophantic and hallucination-prone."

#ai-model-releases #model-evaluation #claude-opus #coding-performance #model-safety-and-honesty

Read at ZDNET

Unable to calculate read time

Collection

[

...

]

AI Model Release Tracker: Opus 4.8's misalignment rates similar to Claude Mythos PreviewAI Model Release Tracker: Opus 4.8's misalignment rates similar to Claude Mythos Preview Briefly

AI Model Release Tracker: Opus 4.8's misalignment rates similar to Claude Mythos Preview
AI Model Release Tracker: Opus 4.8's misalignment rates similar to Claude Mythos Preview
Briefly