OpenAI Research Finds That Even Its Best Models Give Wrong Answers a Wild Proportion of the Time

from Futurism 4 months ago

OpenAI's o1-preview model scored only 42.7% on the SimpleQA benchmark, revealing a worrying trend among advanced AI models to produce incorrect answers more often than correct ones.
Futurismhttps://futurism.com/the-byte/openai-research-best-models-wrong-answers

Despite the low success rates, AI technologies continue to be integrated into daily life. For instance, hospitals adopting AI for transcription have faced severe issues with inaccuracies.
Futurismhttps://futurism.com/the-byte/openai-research-best-models-wrong-answers

Competitors like Anthropic's Claude-3.5-sonnet model performed even worse, achieving a score of merely 28.9%, while maintaining a more cautious approach about answering questions.
Futurismhttps://futurism.com/the-byte/openai-research-best-models-wrong-answers

The models are often overconfident, showcasing a tendency to 'hallucinate' or create elaborate falsehoods, contributing to a reliance on unreliable outputs in critical areas.
Futurismhttps://futurism.com/the-byte/openai-research-best-models-wrong-answers

Read at Futurism

#ai-accuracy #machine-learning #benchmark-testing #hallucinations

Collection

[

...

]

OpenAI Research Finds That Even Its Best Models Give Wrong Answers a Wild Proportion of the TimeOpenAI Research Finds That Even Its Best Models Give Wrong Answers a Wild Proportion of the Time Briefly

OpenAI Research Finds That Even Its Best Models Give Wrong Answers a Wild Proportion of the Time
OpenAI Research Finds That Even Its Best Models Give Wrong Answers a Wild Proportion of the Time
Briefly