-
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters
Paper • 2403.02677 • Published • 16 -
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Paper • 2403.03003 • Published • 9 -
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Paper • 2403.01487 • Published • 14 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 44
Collections
Discover the best community collections!
Collections including paper arxiv:2410.05993
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 4 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper • 1907.04840 • Published • 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 4 -
A Mixture of h-1 Heads is Better than h Heads
Paper • 2005.06537 • Published • 2
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 14 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 45 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 40
-
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 30 -
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Paper • 2312.17172 • Published • 26 -
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Paper • 2401.01974 • Published • 5 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 27
-
NExT-GPT: Any-to-Any Multimodal LLM
Paper • 2309.05519 • Published • 78 -
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper • 2309.03883 • Published • 33 -
apple/DCLM-7B
Updated • 336 • 821 -
Aria: An Open Multimodal Native Mixture-of-Experts Model
Paper • 2410.05993 • Published • 107