Controllable Text Generation for Large Language Models: A Survey Paper • 2408.12599 • Published 28 days ago • 61
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published 28 days ago • 33
DreamCinema: Cinematic Transfer with Free Camera and 3D Character Paper • 2408.12601 • Published 28 days ago • 28
Real-Time Video Generation with Pyramid Attention Broadcast Paper • 2408.12588 • Published 28 days ago • 13
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published about 1 month ago • 54
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning Paper • 2408.11001 • Published about 1 month ago • 11
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher Paper • 2408.14176 • Published 25 days ago • 58
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation Paper • 2408.14819 • Published 24 days ago • 18
Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation Paper • 2408.15991 • Published 22 days ago • 15
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Paper • 2408.16767 • Published 21 days ago • 29
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution Paper • 2310.16834 • Published Oct 25, 2023 • 2
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters Paper • 2408.17253 • Published 21 days ago • 35
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper • 2409.02095 • Published 16 days ago • 32
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published 16 days ago • 84
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation Paper • 2409.02245 • Published 16 days ago • 9
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing Paper • 2409.01322 • Published 17 days ago • 94
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation Paper • 2409.03718 • Published 14 days ago • 24
Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task Paper • 2409.04005 • Published 14 days ago • 16
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis Paper • 2409.06135 • Published 10 days ago • 14
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models Paper • 2409.07452 • Published 8 days ago • 18
Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering Paper • 2409.07441 • Published 8 days ago • 8
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Paper • 2409.08240 • Published 7 days ago • 14
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors Paper • 2409.08278 • Published 7 days ago • 10
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally Paper • 2409.08270 • Published 7 days ago • 8
Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos Paper • 2409.08353 • Published 7 days ago • 9
InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published 7 days ago • 24
A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis Paper • 2409.08947 • Published 6 days ago • 11
DrawingSpinUp: 3D Animation from Single Character Drawings Paper • 2409.08615 • Published 7 days ago • 10
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published 6 days ago • 38
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published 2 days ago • 19
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published 2 days ago • 24
OSV: One Step is Enough for High-Quality Image to Video Generation Paper • 2409.11367 • Published 2 days ago • 11
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published 3 days ago • 11
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction Paper • 2409.11211 • Published 3 days ago • 6
Single-Layer Learnable Activation for Implicit Neural Representation (SL^{2}A-INR) Paper • 2409.10836 • Published 3 days ago • 4
Implicit Neural Representations with Fourier Kolmogorov-Arnold Networks Paper • 2409.09323 • Published 6 days ago • 3
Towards Diverse and Efficient Audio Captioning via Diffusion Models Paper • 2409.09401 • Published 6 days ago • 5