Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding Paper • 2411.18462 • Published 3 days ago • 6
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding Paper • 2411.18363 • Published 4 days ago • 7
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing Paper • 2411.16781 • Published 6 days ago • 9
MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts Paper • 2411.14721 • Published 9 days ago • 3
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity Paper • 2411.15411 • Published 8 days ago • 7
VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models Paper • 2411.17451 • Published 5 days ago • 9
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs Paper • 2411.15296 • Published 8 days ago • 19
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published 5 days ago • 41
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published 5 days ago • 60
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages Paper • 2411.16508 • Published 5 days ago • 7
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI Paper • 2411.14522 • Published 9 days ago • 29
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published 5 days ago • 28
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge Paper • 2411.16594 • Published 5 days ago • 31
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Paper • 2411.13543 • Published 10 days ago • 17
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper • 2411.14982 • Published 9 days ago • 13