Current LLMs are not capable of genuine logical reasoning. Instead, they attempt to replicate the reasoning steps observed in their training data.
The fragility highlighted in these results helps support previous research suggesting that LLMs use of probabilistic pattern matching is missing the formal understanding of underlying concepts needed for truly reliable mathematical reasoning capabilities.
In the GSM-Symbolic evaluation, certain names and numbers in the math problems are modified, which helps avoid potential 'data contamination' from static questions.
Advanced LLMs are being touted for reasoning capabilities, but the mathematical reasoning they display is fragile and unreliable due to trivial changes in benchmark problems.
#artificial-intelligence #mathematical-reasoning #large-language-models #research-findings #data-contamination
Collection
[
|
...
]