joytafty
's Collections
multimodal LLM
updated
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
Generation and Editing
Paper
•
2311.00571
•
Published
•
40
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
•
2311.05437
•
Published
•
45
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task
Instruction Tuning
Paper
•
2310.08166
•
Published
•
1
Reformulating Vision-Language Foundation Models and Datasets Towards
Universal Multimodal Assistants
Paper
•
2310.00653
•
Published
•
3
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
Fine-grained Correctional Human Feedback
Paper
•
2312.00849
•
Published
•
8
Merlin:Empowering Multimodal LLMs with Foresight Minds
Paper
•
2312.00589
•
Published
•
24
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved
Pre-Training
Paper
•
2401.00849
•
Published
•
14
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language
Models
Paper
•
2312.17661
•
Published
•
13
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper
•
2312.16862
•
Published
•
30
Long Context Transfer from Language to Vision
Paper
•
2406.16852
•
Published
•
32
Building and better understanding vision-language models: insights and
future directions
Paper
•
2408.12637
•
Published
•
116