MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation Paper • 2404.05674 • Published Apr 8 • 13
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 57
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions Paper • 2312.08578 • Published Dec 14, 2023 • 16
Music ControlNet: Multiple Time-varying Controls for Music Generation Paper • 2311.07069 • Published Nov 13, 2023 • 43
Dual-Stream Diffusion Net for Text-to-Video Generation Paper • 2308.08316 • Published Aug 16, 2023 • 23