Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2407.03320

Vision-Language Model

Visual Instruction Tuning

Paper • 2304.08485 • Published Apr 17, 2023 • 13
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 6
Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 6

To read... eventually

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 124
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 50
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6 • 12
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 65

Papers - Video - Understanding

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14 • 13
VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15 • 31
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Paper • 2403.13501 • Published Mar 20 • 9
LITA: Language Instructed Temporal-Localization Assistant

Paper • 2403.19046 • Published Mar 27 • 18

about 24 hours ago

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

Paper • 2403.06775 • Published Mar 11 • 3
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Paper • 2010.11929 • Published Oct 22, 2020 • 6
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition

Paper • 2110.07040 • Published Oct 13, 2021 • 2
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks

Paper • 1811.00056 • Published Oct 31, 2018 • 2

TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4 • 89
MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24 • 44
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26 • 68
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29 • 47

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Paper • 2312.16862 • Published Dec 28, 2023 • 30
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Paper • 2312.17172 • Published Dec 28, 2023 • 26
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers

Paper • 2401.01974 • Published Jan 3 • 5
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3 • 27

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Paper • 2311.10093 • Published Nov 16, 2023 • 57
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Paper • 2311.12229 • Published Nov 20, 2023 • 26
Diffusion Model Alignment Using Direct Preference Optimization

Paper • 2311.12908 • Published Nov 21, 2023 • 47
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Paper • 2312.00845 • Published Dec 1, 2023 • 36

ChatAnything: Facetime Chat with LLM-Enhanced Personas

Paper • 2311.06772 • Published Nov 12, 2023 • 34
Fine-tuning Language Models for Factuality

Paper • 2311.08401 • Published Nov 14, 2023 • 28
A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 21
Instruction-Following Evaluation for Large Language Models

Paper • 2311.07911 • Published Nov 14, 2023 • 19

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs