Vision language models demonstrate a tendency to misinterpret images, seeing patterns and illusions that are absent. An experiment replicated with ChatGPT involved uploading a clear image of a duck, which the model incorrectly identified as the ambiguous duck-rabbit optical illusion. This phenomenon was explored in Tomer Ullman's preprint paper, where he illustrated how certain AI systems can mistake distinct images for optical illusions, despite human perception being clear. Such misinterpretations highlight the gap in AI perception compared to human cognition, making them important for cognitive science and understanding artificial intelligence.
Ullman's recent preprint paper describes various examples of these 'illusion-illusions,' where AI models mistakenly perceive what appears to be an optical illusion, despite it presenting visual clarity to humans.
Illusions can be a useful diagnostic tool in cognitive science, philosophy, and neuroscience because they reveal the gap between how something 'really is' and how it 'appears to be.'
ChatGPT, reflecting human-like cognitive biases, misidentified an image of a simple duck as an ambiguous optical illusion, highlighting issues of perception in AI.
Tomer Ullman’s research investigates whether current vision language models can differentiate between true optical illusions and straightforward images, revealing biases in AI perception.
Collection
[
|
...
]