Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published 10 days ago • 39
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Paper • 2411.07199 • Published 20 days ago • 44
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published 24 days ago • 35
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published 24 days ago • 109
Teach Multimodal LLMs to Comprehend Electrocardiographic Images Paper • 2410.19008 • Published Oct 21 • 22
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18 • 74
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Paper • 2410.14669 • Published Oct 18 • 35
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks Paper • 2410.12381 • Published Oct 16 • 42
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models Paper • 2410.07985 • Published Oct 10 • 26
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14 • 37