Is Opus 4.5 really 'the best model in the world for coding'? It just failed half my tests
Briefly

Is Opus 4.5 really 'the best model in the world for coding'? It just failed half my tests
"I've got to tell you: I've had fairly okay coding results with Claude's lower-end Sonnet AI model. But for whatever reason, its high-end Opus model has never done well on my tests. Usually, you expect the super-duper coding model to code better than the cheap seats, but with Opus, not so much. Also: Google's Antigravity puts coding productivity before AI hype - and the result is astonishing Now, we're back with Opus 4.5."
"I'll give you the TL;DR right now. Opus 4.5 crashed and burned on one test, turned in a mediocre and not-quite-good-enough answer on the second, and passed the remaining two. With a 50% score, we're definitely not looking at "the best model in the world for coding." Let's dig in, and then I'll wrap up with some thoughts."
Opus 4.5 was evaluated on a standard set of four simple coding tests and achieved a 50% pass rate. The model crashed on one test, produced a mediocre answer on a second, and passed the remaining two. File-handling glitches specifically interfered with basic WordPress plugin testing, making verification difficult. Anthropic claims Opus 4.5 is intelligent, efficient, and ideal for coding and agents, but performance and reliability issues undermine that claim. Prior lower-end Sonnet results were comparatively acceptable, while Opus 4.5 demonstrates inconsistent coding reliability.
Read at ZDNET
Unable to calculate read time
[
|
]