fromTheregister
3 hours agoAnthropic's latest Sonnet is better at using computers
The tweaks to Sonnet 4.6 have taken it past the pricier Opus 4.6 in two of 13 benchmark categories: agentic financial analysis (Finance Agent v1.1, 63.3 percent vs. 60.1 percent) and office tasks (GDPVal-AA Elo, 1633 vs. 1606). Opus 4.6 wins in six of the 13 categories, in tests that show rival Gemini 3 Pro and GPT-5.2 each leading in 2 of 13 categories. But benchmark tests should not be taken too seriously.
Artificial intelligence





























