-
Visual Instruction Tuning
Paper • 2304.08485 • Published • 13 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 7 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 37 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 6
Collections
Discover the best community collections!
Collections including paper arxiv:2311.12793
-
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 88 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 188 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 44 -
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Paper • 2403.04692 • Published • 40
-
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences
Paper • 2401.10529 • Published • 1 -
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Paper • 2311.12793 • Published • 18 -
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 26 -
SVIT: Scaling up Visual Instruction Tuning
Paper • 2307.04087 • Published • 6
-
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Paper • 2311.12793 • Published • 18 -
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Paper • 2311.12198 • Published • 22 -
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Paper • 2311.18775 • Published • 6 -
Code Llama: Open Foundation Models for Code
Paper • 2308.12950 • Published • 22
-
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort
Paper • 2311.11243 • Published • 14 -
Make Pixels Dance: High-Dynamic Video Generation
Paper • 2311.10982 • Published • 68 -
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
Paper • 2311.10794 • Published • 24 -
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Paper • 2311.12793 • Published • 18