Text Generation
Transformers
Safetensors
English
llava
multimodal
conversational
Eval Results
Inference Endpoints
ZhangYuanhan commited on
Commit
c9f2a86
1 Parent(s): 1bfe401

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -131,7 +131,7 @@ base_model:
131
 
132
  ## Model Summary
133
 
134
- The LLaVA-NeXT-Video models are 7/72B parameter models trained on [LLaVA-NeXT-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), based on Qwen2 language model with a context window of 32K tokens.
135
 
136
  This model support at most 64 frames.
137
 
@@ -144,7 +144,7 @@ This model support at most 64 frames.
144
 
145
  ### Intended use
146
 
147
- The model was trained on [LLaVA-NeXT-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), having have the ability to interact with images, multi-image and videos, but specific to videos.
148
 
149
  **Feel free to share your generations in the Community tab!**
150
 
 
131
 
132
  ## Model Summary
133
 
134
+ The LLaVA-NeXT-Video models are 7/72B parameter models trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), based on Qwen2 language model with a context window of 32K tokens.
135
 
136
  This model support at most 64 frames.
137
 
 
144
 
145
  ### Intended use
146
 
147
+ The model was trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and [LLaVA-OneVision Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), having have the ability to interact with images, multi-image and videos, but specific to videos.
148
 
149
  **Feel free to share your generations in the Community tab!**
150