Logan Bolton's picture
6 2

Logan Bolton

loganbolton

AI & ML interests

Computer Vision, Explainable AI, NLP

Recent Activity

updated a dataset 6 days ago
XAI/vlmsareblind
upvoted a paper 29 days ago
GPT-4o System Card
View all activity

Organizations

loganbolton's activity

Reacted to Taylor658's post with ๐Ÿ”ฅ 5 months ago
view post
Post
805
Researchers from Auburn University and the University of Alberta have explored the limitations of Vision Language Models (VLMs) in their recently published paper titled "Vision language models are blind." ( Vision language models are blind (2407.06581))

Key Findings:๐Ÿ”
VLMs, including GPT-4o, Gemini-1.5 Pro, Claude-3 Sonnet, and Claude-3.5 Sonnet, struggle with basic visual tasks.
Tasks such as identifying where lines intersect or counting basic shapes are challenging for these models.
The authors noted, "The shockingly poor performance of four state-of-the-art VLMs suggests their vision is, at best, like of a person with myopia seeing fine details as blurry, and at worst, like an intelligent person that is blind making educated guesses"โ€‹(Vision Language Models Are Blind; 2024)โ€‹.

Human-like Myopia? ๐Ÿ‘“
VLMs may have a blind spot similar to human myopia.
This limitation makes it difficult for VLMs to perceive details.
Suggests a potential parallel between human and machine vision limitations.

Technical Details: ๐Ÿ”ง
The researchers created a new benchmark called BlindTest.
BlindTest consists of simple visual tasks to evaluate VLMs low-level vision capabilities.
Four VLMs were assessed using BlindTest.
Many shortcomings were revealed in the models ability to process basic visual information.

Learn More: ๐Ÿ–ผ๏ธ
For a deeper dive into this research, check out the project page: https://vlmsareblind.github.io/