LMMs-Lab

community

https://lmms-lab.framer.ai/

lmmslab

EvolvingLMMs-Lab

AI & ML interests

Feeling and building the multimodal intelligence.

Organization Card

Community About org cards

[2024-11] 🤯🤯 We introduce Multimodal SAE, the first framework designed to interpret learned features in large-scale multimodal models using Sparse Autoencoders. Through our approach, we leverage LLaVA-OneVision-72B to analyze and explain the SAE-derived features of LLaVA-NeXT-LLaMA3-8B. Furthermore, we demonstrate the ability to steer model behavior by clamping specific features to alleviate hallucinations and avoid safety-related issues.

GitHub | Paper
[2024-10] 🔥🔥 We present LLaVA-Critic, the first open-source large multimodal model as a generalist evaluator for assessing LMM-generated responses across diverse multimodal tasks and scenarios.

GitHub | Blog
[2024-10] 🎬🎬 Introducing LLaVA-Video, a family of open large multimodal models designed specifically for advanced video understanding. We're open-sourcing LLaVA-Video-178K, a high-quality, synthetic dataset for video instruction tuning.

GitHub | Blog
[2024-08] 🤞🤞 We present LLaVA-OneVision, a family of LMMs developed by consolidating insights into data, models, and visual representations.

GitHub | Blog
[2024-06] 🧑‍🎨🧑‍🎨 We release LLaVA-NeXT-Interleave, an LMM extending capabilities to real-world settings: Multi-image, Multi-frame (videos), Multi-view (3D), and Multi-patch (single-image).

GitHub | Blog
[2024-06] 🚀🚀 We release LongVA, a long language model with state-of-the-art video understanding performance.

GitHub | Blog

Older Updates (2024-06 and earlier)

[2024-06] 🎬🎬 The lmms-eval/v0.2 toolkit now supports video evaluations for models like LLaVA-NeXT Video and Gemini 1.5 Pro.

GitHub | Blog
[2024-05] 🚀🚀 We release LLaVA-NeXT Video, a model performing at Google's Gemini level on video understanding tasks.

GitHub | Blog
[2024-05] 🚀🚀 The LLaVA-NeXT model family reaches near GPT-4V performance on multimodal benchmarks, with models up to 110B parameters.

GitHub | Blog
[2024-03] We release lmms-eval, a toolkit for holistic evaluations with 50+ multimodal datasets and 10+ models.

GitHub | Blog

Collections 9

spaces 3

LiveBench

Running on Zero

LLaVA-NeXT-Interleave-Demo

LongVA Demo

models 42

lmms-lab/llama3-llava-next-8b-hf-sae-131k

Updated 2 days ago • 19 • 1

lmms-lab/LLaVA-Video-72B-Qwen2

Text Generation • Updated Oct 25 • 4.22k • 15

lmms-lab/LLaVA-Video-7B-Qwen2

Video-Text-to-Text • Updated Oct 25 • 43.5k • 39

lmms-lab/llava-onevision-qwen2-7b-ov-chat

Text Generation • Updated Oct 23 • 4.99k • 15

lmms-lab/qwen-navit

Image-Text-to-Text • Updated Oct 19 • 3

lmms-lab/llava-onevision-qwen2-72b-ov-chat

Image-Text-to-Text • Updated Oct 9 • 79.6k • 6

lmms-lab/llava-critic-72b

Updated Oct 4 • 119 • 14

lmms-lab/llava-critic-7b

Updated Oct 4 • 9.56k • 10

lmms-lab/LLaVA-Video-7B-Qwen2-Video-Only

Text Generation • Updated Oct 4 • 16.6k • 3

lmms-lab/LLaVA-NeXT-Video-32B-Qwen

Video-Text-to-Text • Updated Oct 4 • 1.09k • 12

datasets 104

lmms-lab/sae-sample-cache-dataset

Viewer • Updated 2 days ago • 46.7k • 17

lmms-lab/MixEval-X-audio2text

Viewer • Updated 5 days ago • 1.47k • 15

lmms-lab/MIA-Bench

Viewer • Updated 5 days ago • 400 • 29

lmms-lab/GoogleDeepMind-NEPTUNE

Viewer • Updated 5 days ago • 8.8k • 97 • 1

lmms-lab/llava-sae-explanations-5k

Viewer • Updated 6 days ago • 9.8k • 50 • 2

lmms-lab/ai2d-no-mask

Viewer • Updated 7 days ago • 3.09k • 19

lmms-lab/ClothoAQA

Viewer • Updated about 1 month ago • 4.55k • 125

lmms-lab/librispeech

Viewer • Updated Oct 27 • 13k • 271 • 4

lmms-lab/common_voice_15

Viewer • Updated Oct 27 • 43.1k • 203

lmms-lab/tedlium

Viewer • Updated Oct 27 • 599 • 72