PUMA: Empowering Unified MLLM with Multi-granular Visual Generation Paper • 2410.13861 • Published 23 days ago • 53
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published 19 days ago • 65
Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant Paper • 2410.13360 • Published 24 days ago • 8
OneLLM: One Framework to Align All Modalities with Language Paper • 2312.03700 • Published Dec 6, 2023 • 20
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models Paper • 2311.07575 • Published Nov 13, 2023 • 13