marianna13's picture
Update README.md
e4699ba verified
metadata
language:
  - en
license: mit
library_name: transformers
datasets:
  - liuhaotian/LLaVA-Instruct-150K
  - liuhaotian/LLaVA-Pretrain

Model Card for LLaVa-Phi-2-3B

Model Details

Model Description

  • Developed by: LAION, SkunkworksAI & Ontocord
  • Model type: LLaVA is an open-source chatbot trained by fine-tuning Phi-2 on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture
  • Finetuned from model: Phi-2
  • License: MIT

Model Sources

Evaluation

Benchmarks

Model Parameters SQA GQA TextVQA POPE
LLaVA-1.5 7.3B 68.0 62.0 58.3 85.3
MC-LLaVA-3B 3B - 49.6 38.59 -
LLaVA-Phi 3B 68.4 - 48.6 85.0
moondream1 1.6B - 56.3 39.8 -
llava-phi-2-3b 2.7B 69.0 51.2 47.0 86.0
llava-phi-2-3b-siglip 2.7B 70.15% 52.56% 47.99% 87.00%