3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes Paper • 2411.14974 • Published 9 days ago • 12
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published 3 days ago • 35
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning Paper • 2411.18203 • Published 4 days ago • 22
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published 8 days ago • 54
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published 6 days ago • 28
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published 9 days ago • 19
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs Paper • 2411.14199 • Published 10 days ago • 25
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published 9 days ago • 52
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published 10 days ago • 37
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published 15 days ago • 61
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization Paper • 2411.11909 • Published 14 days ago • 20
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration Paper • 2411.10958 • Published 14 days ago • 47
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 15 days ago • 106
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation Paper • 2411.08033 • Published 18 days ago • 21
Number it: Temporal Grounding Videos like Flipping Manga Paper • 2411.10332 • Published 15 days ago • 12
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published 15 days ago • 27