Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations Paper • 2410.10792 • Published Oct 14 • 27
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory Paper • 2411.11922 • Published 15 days ago • 17
OminiControl: Minimal and Universal Control for Diffusion Transformer Paper • 2411.15098 • Published 10 days ago • 41
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published 10 days ago • 55
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published 11 days ago • 39
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper • 2411.07461 • Published 21 days ago • 21
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion Paper • 2411.04928 • Published 25 days ago • 48
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers Paper • 2401.08740 • Published Jan 16 • 12
LLaVA-Video Collection Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 6 items • Updated Oct 5 • 55
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published Sep 27 • 25
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published Sep 24 • 32
OSV: One Step is Enough for High-Quality Image to Video Generation Paper • 2409.11367 • Published Sep 17 • 13
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models Paper • 2409.10695 • Published Sep 16 • 2
PiTe: Pixel-Temporal Alignment for Large Video-Language Model Paper • 2409.07239 • Published Sep 11 • 11