Logan Bolton

loganbolton

LoganBolton

AI & ML interests

Computer Vision, Explainable AI, NLP

Recent Activity

updated a dataset 6 days ago

XAI/vlmsareblind

upvoted a paper 29 days ago

GPT-4o System Card

View all activity

Organizations

loganbolton's activity

updated a dataset 6 days ago

XAI/vlmsareblind

Viewer • Updated 6 days ago • 8.02k • 121 • 21

upvoted a paper 29 days ago

GPT-4o System Card

Paper • 2410.21276 • Published Oct 25 • 79

upvoted 2 papers 3 months ago

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Paper • 2409.02889 • Published Sep 4 • 54

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 121

upvoted a paper 4 months ago

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 108

liked a Space 4 months ago

Sleeping

📚

VLMsAreBlind ResultsReview

upvoted a paper 4 months ago

VideoGameBunny: Towards vision assistants for video games

Paper • 2407.15295 • Published Jul 21 • 21

Reacted to Taylor658's post with 🔥 5 months ago

Post

805

Researchers from Auburn University and the University of Alberta have explored the limitations of Vision Language Models (VLMs) in their recently published paper titled "Vision language models are blind." ( Vision language models are blind (2407.06581))

Key Findings:🔍
VLMs, including GPT-4o, Gemini-1.5 Pro, Claude-3 Sonnet, and Claude-3.5 Sonnet, struggle with basic visual tasks.
Tasks such as identifying where lines intersect or counting basic shapes are challenging for these models.
The authors noted, "The shockingly poor performance of four state-of-the-art VLMs suggests their vision is, at best, like of a person with myopia seeing fine details as blurry, and at worst, like an intelligent person that is blind making educated guesses"(Vision Language Models Are Blind; 2024).

Human-like Myopia? 👓
VLMs may have a blind spot similar to human myopia.
This limitation makes it difficult for VLMs to perceive details.
Suggests a potential parallel between human and machine vision limitations.

Technical Details: 🔧
The researchers created a new benchmark called BlindTest.
BlindTest consists of simple visual tasks to evaluate VLMs low-level vision capabilities.
Four VLMs were assessed using BlindTest.
Many shortcomings were revealed in the models ability to process basic visual information.

Learn More: 🖼️
For a deeper dive into this research, check out the project page: https://vlmsareblind.github.io/

authored a paper 5 months ago

Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 82

upvoted a paper 5 months ago

Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 82

liked a dataset 5 months ago

XAI/vlmsareblind

Viewer • Updated 6 days ago • 8.02k • 121 • 21