Apple researchers have identified significant limitations in large reasoning models (LRMs) regarding their performance on complex tasks. As problem-solving complexity increases, these models show a tendency to reduce their reasoning efforts, resulting in catastrophic drops in accuracy. While LRM performance is superior on simpler tasks, it deteriorates under higher complexity, wasting computational resources by initially exploring incorrect solutions. Despite advanced testing of models like Google's Gemini and OpenAI's o3, it's evident that a fundamental scaling limitation constrains the reasoning capabilities of these AI systems, as highlighted in the research findings.
Upon approaching a critical threshold - which closely corresponds to their accuracy collapse point - models counterintuitively begin to reduce their reasoning effort despite increasing problem difficulty.
The research states that large reasoning models (LRMs) will experience a 'complete accuracy collapse' when faced with complex problems.
Collection
[
|
...
]