|
--- |
|
datasets: |
|
- Lin-Chen/ShareGPT4V |
|
- FreedomIntelligence/ALLaVA-4V |
|
- Vision-Flan/vision-flan_191-task_1k |
|
--- |
|
# ConvLLaVA Model Card |
|
|
|
## Model details |
|
|
|
**Model type:** ConvLLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: lmsys/vicuna-7b-v1.5 |
|
|
|
**Model date:** ConvLLaVA-pretrain-768 was trained in March 2024. |
|
|
|
Paper or resources for more information: https://github.com/alibaba/conv-llava/ |
|
|
|
## License |
|
Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved. |
|
|
|
Where to send questions or comments about the model: https://github.com/alibaba/conv-llava/issues |
|
|
|
## Intended use |
|
**Primary intended uses:** The primary use of ConvLLaVA is research on large multimodal models and chatbots. |
|
|
|
**Primary intended users:** The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence. |
|
|
|
## Training dataset |
|
|
|
- 1.2M ShareGPT4V-PT caption data. |
|
- 100K ShareGPT4V caption data. |
|
- 1.4M ALLaVA caption and instruction data. |
|
- 186K VFLAN multitask data. |
|
- 158K GPT-generated multimodal instruction-following data. |
|
- 500K academic-task-oriented VQA data mixture. |
|
- 40K ShareGPT data. |
|
|
|
## Paper |
|
arxiv.org/abs/2405.15738 |