Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Paper • 2407.07053 • Published Jul 9 • 41
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 12 days ago • 102