Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation Paper • 2412.01316 • Published 10 days ago • 8
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation Paper • 2412.01316 • Published 10 days ago • 8
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation Paper • 2412.01316 • Published 10 days ago • 8 • 2
Centroid-centered Modeling for Efficient Vision Transformer Pre-training Paper • 2303.04664 • Published Mar 8, 2023
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos Paper • 2402.06119 • Published Feb 9 • 1
3D-VLA: A 3D Vision-Language-Action Generative World Model Paper • 2403.09631 • Published Mar 14 • 7