Interest - a Exclibur Collection

Exclibur 's Collections

Interest

updated 17 minutes ago

CompCap: Improving Multimodal Large Language Models with Composite Captions

Paper • 2412.05243 • Published 20 days ago • 18
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

Paper • 2412.04814 • Published 20 days ago • 45
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Paper • 2412.05237 • Published 20 days ago • 46
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models

Paper • 2412.05939 • Published 18 days ago • 12
Chimera: Improving Generalist Model with Domain-Specific Experts

Paper • 2412.05983 • Published 18 days ago • 9
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Paper • 2412.06673 • Published 17 days ago • 11
Video Motion Transfer with Diffusion Transformers

Paper • 2412.07776 • Published 16 days ago • 17
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

Paper • 2412.03548 • Published 22 days ago • 16
Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation

Paper • 2412.07334 • Published 16 days ago • 16
StreamChat: Chatting with Streaming Video

Paper • 2412.08646 • Published 15 days ago • 17
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Paper • 2412.05552 • Published 19 days ago • 4
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Paper • 2412.09585 • Published 14 days ago • 10
Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published 15 days ago • 41
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions

Paper • 2412.08737 • Published 15 days ago • 51
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published 14 days ago • 90
VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding

Paper • 2412.02186 • Published 23 days ago • 22
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Paper • 2412.17739 • Published 3 days ago • 26