Artificial intelligence (AI) tackling human cognitive puzzles? The latest research explores this intriguing intersection.
Recently, the USC Viterbi School of Engineering Information Sciences Institute (ISI) delved into the abilities of multi-modal large language models (MLLMs) to solve abstract visual tests designed for human intelligence. The results are turning heads.
Discover More: The top AI chatbots: ChatGPT, Copilot, and beyond
Presented at the Conference on Language Modeling (COLM 2024) in Philadelphia, the study pushed “nonverbal abstract reasoning abilities of open-source and closed-source MLLMs” to the test, challenging image-processing models to showcase reasoning skills often associated with human cognition.
“Can the model apply the same pattern in a different scenario, like a yellow circle transforming into a blue triangle?” shared Kian Ahrabian, a research assistant on the project. This task demands visual perception and logical reasoning akin to human thinking, making it a significant test of AI capability.
When 24 MLLMs were put to the test with puzzles modeled after Raven’s Progressive Matrices, the AI models faced hurdles. Ahrabian admitted, “They were really bad. They couldn’t get anything out of it,” highlighting the models’ struggles with visual interpretation and pattern recognition.
Learn More: Open-source AI definition advances with first release candidate – and a middle ground
While open-source models showed greater difficulty with visual reasoning puzzles compared to closed-source models like GPT-4V, the AI models, even the advanced ones, fell short of human cognitive abilities. Some models showed improvement with Chain of Thought prompting, a method guiding models through the reasoning process step-by-step.
Closed-source models, benefiting from specialized development, extensive training data, and robust computing resources, outperformed open-source counterparts in the tests. However, Ahrabian emphasized, “GPT-4V was relatively good at reasoning, but it’s far from perfect.”
“Understanding the limitations of AI models is crucial to enhancing their performance and reliability,” shared Jay Pujara, a research associate professor and author. “This study helps uncover a piece of the puzzle on where AI struggles.”
Read Next: AI now matches human accuracy in solving reCAPTCHA tests
By highlighting AI models’ reasoning weaknesses, studies like this pave the way for refining these skills in the future, aiming for human-level logical reasoning. While AI capabilities continue to evolve, they still have a long way to go before rivaling human cognition.