This Tool Probes Frontier AI Models for Lapses in Intelligence
Briefly

Executives in AI frequently assert that Artificial General Intelligence (AGI) is imminent, yet current models require further training for optimal performance. Scale AI has introduced a platform called Scale Evaluation that automates testing, monitors thousands of benchmarks, detects weaknesses, and recommends specific data for targeted training to enhance model abilities. Previously, Scale relied on human labor for testing, but this new tool leverages machine learning to streamline processes. Daniel Berrios from Scale highlights its ability to help model developers better understand and improve their models, particularly in enhancing reasoning skills.
"Within the big labs, there are all these haphazard ways of tracking some of the model weaknesses. The new tool is a way for model makers to go through results and slice and dice them to understand where a model is not performing well."
"Several frontier AI model companies are using Scale Evaluation to improve the reasoning capabilities of their best models, as AI reasoning relies heavily on post-training from users to determine correct problem-solving."
Read at WIRED
[
|
]