multimodal LLM - a joytafty Collection

joytafty 's Collections

LLM Hallucination

PEFT

multimodal LLM

updated Sep 10

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 40
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 45
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning

Paper • 2310.08166 • Published Oct 12, 2023 • 1
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

Paper • 2310.00653 • Published Oct 1, 2023 • 3
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Paper • 2312.00849 • Published Dec 1, 2023 • 8
Merlin:Empowering Multimodal LLMs with Foresight Minds

Paper • 2312.00589 • Published Nov 30, 2023 • 24
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1 • 14
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models

Paper • 2312.17661 • Published Dec 29, 2023 • 13
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Paper • 2312.16862 • Published Dec 28, 2023 • 30
Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24 • 32
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 116