Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.12793

Vision-Language Model

Visual Instruction Tuning

Paper • 2304.08485 • Published Apr 17, 2023 • 13
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 7
Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 6

Video/Image/Gif/etc.

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27 • 188
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 44
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Paper • 2403.04692 • Published Mar 7 • 40

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

Paper • 2401.10529 • Published Jan 19 • 1
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 26
SVIT: Scaling up Visual Instruction Tuning

Paper • 2307.04087 • Published Jul 9, 2023 • 6

Lin-Chen/ShareGPT4V-7B

Image-Text-to-Text • Updated Jun 6 • 1.04k • 80
Lin-Chen/ShareGPT4V-13B

Text Generation • Updated Jun 6 • 429 • 33
Lin-Chen/ShareCaptioner

Feature Extraction • Updated Jun 6 • 1.38k • 49
Lin-Chen/ShareGPT4V

Viewer • Updated Jun 6 • 1.35M • 553 • 258

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

Paper • 2311.12198 • Published Nov 20, 2023 • 22
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Paper • 2311.18775 • Published Nov 30, 2023 • 6
Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 22

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

Paper • 2311.11243 • Published Nov 19, 2023 • 14
Make Pixels Dance: High-Dynamic Video Generation

Paper • 2311.10982 • Published Nov 18, 2023 • 68
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

Paper • 2311.10794 • Published Nov 17, 2023 • 24
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18

interesting-finds

stas/tiny-random-llama-2

Text Generation • Updated Nov 14, 2023 • 1.15k • 39
openai/whisper-large-v3

Automatic Speech Recognition • Updated Aug 12 • 4.06M • • 3.69k
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs