Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper โข 2411.14432 โข Published 6 days ago โข 19
OtterHD: A High-Resolution Multi-modality Model Paper โข 2311.04219 โข Published Nov 7, 2023 โข 31
OtterHD: A High-Resolution Multi-modality Model Paper โข 2311.04219 โข Published Nov 7, 2023 โข 31
Octopus: Embodied Vision-Language Programmer from Environmental Feedback Paper โข 2310.08588 โข Published Oct 12, 2023 โข 34
Octopus: Embodied Vision-Language Programmer from Environmental Feedback Paper โข 2310.08588 โข Published Oct 12, 2023 โข 34
MIMIC-IT: Multi-Modal In-Context Instruction Tuning Paper โข 2306.05425 โข Published Jun 8, 2023 โข 11
Otter: A Multi-Modal Model with In-Context Instruction Tuning Paper โข 2305.03726 โข Published May 5, 2023 โข 6