Collections
Discover the best community collections!
Collections including paper arxiv:2309.02591
-
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper • 2402.12226 • Published • 40 -
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
Paper • 2401.11649 • Published • 3 -
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
Paper • 2402.15504 • Published • 21 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 188
-
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
Paper • 2309.02591 • Published • 14 -
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 65 -
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Paper • 2310.03214 • Published • 18 -
Lemur: Harmonizing Natural Language and Code for Language Agents
Paper • 2310.06830 • Published • 31