ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models Paper • 2401.13311 • Published Jan 24 • 10
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion Paper • 2401.13388 • Published Jan 24 • 10
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models Paper • 2401.09047 • Published Jan 17 • 13
PALP: Prompt Aligned Personalization of Text-to-Image Models Paper • 2401.06105 • Published Jan 11 • 46
ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers Paper • 2401.02072 • Published Jan 4 • 9
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation Paper • 2401.02117 • Published Jan 4 • 30
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text Paper • 2311.07446 • Published Nov 13, 2023 • 28