-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 15 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 8 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
Collections
Discover the best community collections!
Collections including paper arxiv:2412.04432
-
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
Paper • 2412.05939 • Published • 10 -
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Paper • 2412.04432 • Published • 11 -
Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant
Paper • 2410.13360 • Published • 8 -
Grounding Descriptions in Images informs Zero-Shot Visual Recognition
Paper • 2412.04429 • Published
-
Mind the Time: Temporally-Controlled Multi-Event Video Generation
Paper • 2412.05263 • Published • 8 -
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Paper • 2412.04432 • Published • 11 -
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance
Paper • 2412.05355 • Published • 4
-
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Paper • 2412.03069 • Published • 28 -
Are Emergent Abilities of Large Language Models a Mirage?
Paper • 2304.15004 • Published • 6 -
Scaling Image Tokenizers with Grouped Spherical Quantization
Paper • 2412.02632 • Published • 9 -
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Paper • 2410.13848 • Published • 30
-
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Paper • 2312.04557 • Published • 12 -
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper • 2312.04410 • Published • 14 -
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper • 2312.04461 • Published • 57 -
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Paper • 2401.02955 • Published • 20