Maybe AI agents can be lawyers after all | TechCrunch
Briefly

Maybe AI agents can be lawyers after all | TechCrunch
"This week's release of Opus 4.6 shook up the leaderboards, with Anthropic's new model scoring just shy of 30% in one-shot trials, and an average of 45% when given a few more cracks at the problem. Notably, the release included a bunch of new agentic features, including "agent swarms," which may have helped with this kind of multi-step problem-solving. Regardless, the score is a huge jump from the previous state-of-the-art, and a sign that progress on foundation models isn't slowing down."
"Last month, I wrote about Mercor's new benchmark measuring AI agents' capabilities on professional tasks like law and corporate analysis. At the time, the scores were pretty dismal, with every major lab scoring under 25%, so we concluded lawyers were safe from AI displacement, at least for now. But AI capabilities can change a lot in a couple of weeks."
"Mercor CEO Brendan Foody, who was particularly impressed, said, "jumping from 18.4% to 29.8% in a few months is insane." Thirty percent is still a long way from 100%, so it's not like lawyers need to be worried about getting replaced by machines next week. But they should be a lot less confident than they were last month!"
A benchmark measures AI agents' capabilities on professional tasks such as law and corporate analysis. Initial results showed scores under 25% across major labs. Opus 4.6 produced a large improvement, with Anthropic's model scoring just under 30% in one-shot trials and averaging about 45% when allowed multiple attempts. The release added agentic features, including "agent swarms," which may have aided multi-step problem solving. The jump from roughly 18.4% to about 29.8% represents substantial short-term progress in foundation models. Thirty percent remains far from full competence, so immediate displacement of lawyers is unlikely, but confidence in job safety should be reduced.
Read at TechCrunch
Unable to calculate read time
[
|
]