A Test So Hard No AI System Can Pass It Yet

from www.nytimes.com 1 month ago

As A.I. technology evolves, creating effective tests to evaluate these systems has become increasingly challenging. Initially, A.I. was assessed using standardized tests similar to the S.A.T., but as systems like those from OpenAI and Google scored incredibly high, researchers have begun developing harder assessments, including Ph.D.-level questions. However, these tests have also failed to keep pace, prompting the release of 'Humanity's Last Exam,' purportedly the most difficult A.I. test yet, representing an effort to develop meaningful evaluations in light of A.I.s' growing capabilities.

Researchers have struggled to create effective tests for A.I. systems as they continuously outperform standardized benchmarks, leading to concerns about measuring their intelligence.
www.nytimes.comhttps://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html

As A.I. systems excel in Ph.D.-level problems, a new evaluation titled 'Humanity's Last Exam' has been introduced, arguably the hardest test for A.I. to date.
www.nytimes.comhttps://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html

Read at www.nytimes.com

#ai-testing #evaluation-methods #humanitys-last-exam #ai-safety

Collection

[

...

]

A Test So Hard No AI System Can Pass It YetA Test So Hard No AI System Can Pass It Yet Briefly

A Test So Hard No AI System Can Pass It Yet
A Test So Hard No AI System Can Pass It Yet
Briefly