Are Vision-Language Models Truly Understanding Multi-vision Sensor? Paper • 2412.20750 • Published 12 days ago • 19
Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge Paper • 2407.03958 • Published Jul 4, 2024 • 18
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published May 24, 2024 • 53