LOGO -- Long cOntext aliGnment via efficient preference Optimization Paper • 2410.18533 • Published 15 days ago • 42
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Paper • 2410.13848 • Published 21 days ago • 27
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published 21 days ago • 74
TRACE: Temporal Grounding Video LLM via Causal Event Modeling Paper • 2410.05643 • Published about 1 month ago • 8
TRACE Collection TRACE: Temporal Grounding Video LLM via Casual Event Modeling • 10 items • Updated 7 days ago • 1
MovieSum: An Abstractive Summarization Dataset for Movie Screenplays Paper • 2408.06281 • Published Aug 12 • 9
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Paper • 2407.15841 • Published Jul 22 • 39
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages Paper • 2407.05975 • Published Jul 8 • 34
DynMoE Family Collection DynMoE model checkpoints and paper on huggingface • 4 items • Updated Aug 19 • 3
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding Paper • 2405.13382 • Published May 22 • 1
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models Paper • 2405.14297 • Published May 23 • 2