Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion Paper • 2412.14462 • Published 26 days ago • 15
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding Paper • 2409.18111 • Published Sep 26, 2024 • 6
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos Paper • 2409.19603 • Published Sep 29, 2024 • 19
VideoGUI: A Benchmark for GUI Automation from Instructional Videos Paper • 2406.10227 • Published Jun 14, 2024 • 9
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM Paper • 2406.02884 • Published Jun 5, 2024 • 15