Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward Paper • 2404.01258 • Published Apr 1 • 10
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs Paper • 2403.20041 • Published Mar 29 • 34
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces Paper • 2403.20275 • Published Mar 29 • 8
InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes Paper • 2401.05335 • Published Jan 10 • 26
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations Paper • 2401.01885 • Published Jan 3 • 27
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset Paper • 2402.05937 • Published Feb 8 • 11
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models Paper • 2403.17005 • Published Mar 25 • 13
AudioPaLM: A Large Language Model That Can Speak and Listen Paper • 2306.12925 • Published Jun 22, 2023 • 52
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition Paper • 2403.14148 • Published Mar 21 • 18
RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS Paper • 2403.13806 • Published Mar 20 • 18
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis Paper • 2403.13501 • Published Mar 20 • 9
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models Paper • 2403.13447 • Published Mar 20 • 18
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20 • 62
Compress3D: a Compressed Latent Space for 3D Generation from a Single Image Paper • 2403.13524 • Published Mar 20 • 8
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation Paper • 2403.13745 • Published Mar 20 • 11
DepthFM: Fast Monocular Depth Estimation with Flow Matching Paper • 2403.13788 • Published Mar 20 • 17