Text Generation
Transformers
Safetensors
English
llava
multimodal
conversational
Eval Results
Inference Endpoints
ZhangYuanhan commited on
Commit
09f76c2
1 Parent(s): c53413e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  datasets:
3
  - lmms-lab/LLaVA-NeXT-Video-SFT-Data
 
4
  language:
5
  - en
6
  library_name: transformers
@@ -112,6 +113,8 @@ model-index:
112
  value: 70.5
113
  name: accuracy
114
  verified: true
 
 
115
  ---
116
 
117
 
@@ -128,7 +131,7 @@ model-index:
128
 
129
  ## Model Summary
130
 
131
- The LLaVA-OneVision models are 7/72B parameter models trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data), based on Qwen2 language model with a context window of 32K tokens.
132
 
133
  - **Repository:** [LLaVA-VL/LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT?tab=readme-ov-file)
134
  - **Point of Contact:** [Yuanhan Zhang](https://zhangyuanhan-ai.github.io/)
 
1
  ---
2
  datasets:
3
  - lmms-lab/LLaVA-NeXT-Video-SFT-Data
4
+ - lmms-lab/LLaVA-OneVision-Data
5
  language:
6
  - en
7
  library_name: transformers
 
113
  value: 70.5
114
  name: accuracy
115
  verified: true
116
+ base_model:
117
+ - lmms-lab/llava-onevision-qwen2-7b-si
118
  ---
119
 
120
 
 
131
 
132
  ## Model Summary
133
 
134
+ The LLaVA-Video models are 7/72B parameter models trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data), based on Qwen2 language model with a context window of 32K tokens.
135
 
136
  - **Repository:** [LLaVA-VL/LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT?tab=readme-ov-file)
137
  - **Point of Contact:** [Yuanhan Zhang](https://zhangyuanhan-ai.github.io/)