
Anthropic updated Sonnet to version 4.6, claiming improved coding ability, enhanced automation of computer tasks, and stronger reasoning and planning capabilities. The release follows a recent Opus version bump and sees Sonnet 4.6 outperform Opus 4.6 in two of 13 benchmark categories: agentic financial analysis (Finance Agent v1.1, 63.3% vs. 60.1%) and office tasks (GDPVal-AA Elo, 1633 vs. 1606). Opus 4.6 wins six categories while Gemini 3 Pro and GPT-5.2 each lead in two categories; benchmark results carry caveats. Sonnet 4.6 defaults to a 200K token context window with a 1M option for beta testers. The model scored 72.5 on OSWorld-Verified, up from 28.0 by Sonnet 3.7, and shows improved resistance to prompt injection without increased malicious-use risk.
"The tweaks to Sonnet 4.6 have taken it past the pricier Opus 4.6 in two of 13 benchmark categories: agentic financial analysis (Finance Agent v1.1, 63.3 percent vs. 60.1 percent) and office tasks (GDPVal-AA Elo, 1633 vs. 1606). Opus 4.6 wins in six of the 13 categories, in tests that show rival Gemini 3 Pro and GPT-5.2 each leading in 2 of 13 categories. But benchmark tests should not be taken too seriously."
"Sonnet 4.6 defaults to a context window of 200K, like Opus 4.6 and Haiku 4.5 - that's the amount of material (tokens) that the model can process. But Opus 4.6, Sonnet 4.6, Sonnet 4.5, and Sonnet 4 all offer a 1M token context window for those involved in beta testing - usage tier four and organizations with custom rate limits."
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]