AI is just one year away from beating 'Humanity's Last Exam'

"'We wanted to create this close-ended academic benchmark, set to the frontier of expert humans, that only a handful of people on earth can really solve.'"

"'We've seen over the past few years insane progress on these language models. It's impressive, model builders have really done a great job at improving these reasoning models.'"

"'If we truly cared about this as the only thing in life, I think we could get to it pretty quickly.'"

Humanity's Last Exam (HLE) consists of 2,500 questions across various topics, requiring PhD-level understanding. AI systems like Google Gemini and Anthropic's Claude have shown significant improvement, with scores of 45.9% and 34.2%, respectively. Developers believe that achieving a perfect score is imminent, reflecting the rapid progress in AI capabilities. The test serves as a benchmark for AI intelligence compared to human experts, indicating a narrowing gap between AI and top academics.

#ai #humanitys-last-exam #language-models #benchmarking #intelligence

Read at Mail Online

Unable to calculate read time

Collection

[

...

]

AI is just one year away from beating 'Humanity's Last Exam'AI is just one year away from beating 'Humanity's Last Exam' Briefly

AI is just one year away from beating 'Humanity's Last Exam'
AI is just one year away from beating 'Humanity's Last Exam'
Briefly