lmms-lab
/

LLaVA-Video-72B-Qwen2

Text Generation

Inference Endpoints

Model card Files Files and versions Community

ZhangYuanhan commited on Sep 19

Commit

c9f2a86

•

1 Parent(s): 1bfe401

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -131,7 +131,7 @@ base_model:
 ## Model Summary
-The LLaVA-NeXT-Video models are 7/72B parameter models trained on [LLaVA-NeXT-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), based on Qwen2 language model with a context window of 32K tokens.
 This model support at most 64 frames.
@@ -144,7 +144,7 @@ This model support at most 64 frames.
 ### Intended use
-The model was trained on [LLaVA-NeXT-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), having  have the ability to interact with images, multi-image and videos, but specific to videos.
 **Feel free to share your generations in the Community tab!**

 ## Model Summary
+The LLaVA-NeXT-Video models are 7/72B parameter models trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), based on Qwen2 language model with a context window of 32K tokens.
 This model support at most 64 frames.
 ### Intended use
+The model was trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), having  have the ability to interact with images, multi-image and videos, but specific to videos.
 **Feel free to share your generations in the Community tab!**