LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published 12 days ago • 52
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31 • 73
OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects Paper • 2407.08711 • Published Jul 11 • 5
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12 • 211
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Paper • 2407.04051 • Published Jul 4 • 35
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation Paper • 2407.02371 • Published Jul 2 • 49
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper • 2407.00468 • Published Jun 29 • 35
Benchmarking Mental State Representations in Language Models Paper • 2406.17513 • Published Jun 25 • 3
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 84
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework Paper • 2403.13248 • Published Mar 20 • 76
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities Paper • 2406.11768 • Published Jun 17 • 20
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning Paper • 2406.12742 • Published Jun 18 • 14
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch Paper • 2406.14563 • Published Jun 20 • 30
mDPO: Conditional Preference Optimization for Multimodal Large Language Models Paper • 2406.11839 • Published Jun 17 • 36
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88