jiajunlong
/

TinyLLaVA-OpenELM-450M-SigLIP-0.89B

@@ -4,30 +4,22 @@ pipeline_tag: image-text-to-text
 ---
 ### TinyLLaVA
-We have released two multimodal large models smaller than 1B, and the inference speed of both models on CPU is very fast.
-See the list below for the details of each model:
-- [TinyLLaVA-0.55B](https://huggingface.co/jiajunlong/TinyLLaVA-0.55B)
-- [TinyLLaVA-0.89B](https://huggingface.co/jiajunlong/TinyLLaVA-0.89B)
 ### Usage
-1. you can download the generate file "generate_model.py"
 2. running the following command:
 ```bash
 python generate_model --model jiajunlong/TinyLLaVA-0.89B --prompt 'you want to ask' --image '/path/to/related/image'
 ```
 or  execute the following test code:
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from generate_model import *
-model = AutoModelForCausalLM.from_pretrained("jiajunlong/TinyLLaVA-0.89B", trust_remote_code=True)
 config = model.config
-tokenizer = AutoTokenizer.from_pretrained("jiajunlong/TinyLLaVA-0.89B", use_fast=False, model_max_length = config.tokenizer_model_max_length,padding_side = config.tokenizer_padding_side)
 prompt="you want to ask"
 image="/path/to/related/image"
 output_text, genertaion_time = generate(prompt=prompt, image=image, model=model, tokenizer=tokenizer)
@@ -43,12 +35,11 @@ print_txt = (
     )
 print(print_txt)
 ```
 ### Result
-|                          model_name                          | gqa   | textvqa | sqa   | vqav2 | MME     | MMB   | MM-VET | GPU            | CPU        |
-| :----------------------------------------------------------: | ----- | ------- | ----- | ----- | ------- | ----- | ------ | -------------- | ---------- |
-| [TinyLLaVA-1.5B](https://huggingface.co/bczhou/TinyLLaVA-1.5B) | 60.3  | 51.7    | 60.3  | 76.9  | 1276.5  | 55.2  | 25.8   | 11.9 it/s      | 0.35 it/s  |
-| [TinyLLaVA-0.55B](https://huggingface.co/jiajunlong/TinyLLaVA-0.55B) | 53.87 | 44.02   | 54.09 | 71.74 | 1118.75 | 37.8  | 20     | 11.0 it/s      | 0.67 it/s  |
-| [TinyLLaVA-0.89B](https://huggingface.co/jiajunlong/TinyLLaVA-0.89B) | 50.38 | 36.37   | 50.02 | 65.44 | 1056.69 | 26.29 | 15.4   | **14.35 it/s** | **2 it/s** |

 ---
 ### TinyLLaVA
+We trained 1 model with fewer than 1B parameters using the TinyLLaVA approach, employing the same training settings as [TinyLLaVA](https://github.com/DLCV-BUAA/TinyLLaVABench). For the Language and Vision models, we chose [OpenELM-450M-Instruct](apple/OpenELM-450M-Instruct) and [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384), respectively. The Connector was configured with a 2-layer MLP. The dataset used for training is the save as [LLaVA](https://github.com/haotian-liu/LLaVA). During testing, we found that [TinyLLaVA-0.55B](https://huggingface.co/jiajunlong/TinyLLaVA-0.55B) exhibited significantly faster inference speed on CPU compared to [TinyLLaVA-1.5B](https://huggingface.co/bczhou/TinyLLaVA-1.5B)
 ### Usage
+1. you need to download the generate file "generate_model.py".
 2. running the following command:
 ```bash
 python generate_model --model jiajunlong/TinyLLaVA-0.89B --prompt 'you want to ask' --image '/path/to/related/image'
 ```
 or  execute the following test code:
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from generate_model import *
+model = AutoModelForCausalLM.from_pretrained("jiajunlong/TinyLLaVA-0.55B", trust_remote_code=True)
 config = model.config
+tokenizer = AutoTokenizer.from_pretrained("jiajunlong/TinyLLaVA-0.55B", use_fast=False, model_max_length = config.tokenizer_model_max_length,padding_side = config.tokenizer_padding_side)
 prompt="you want to ask"
 image="/path/to/related/image"
 output_text, genertaion_time = generate(prompt=prompt, image=image, model=model, tokenizer=tokenizer)
     )
 print(print_txt)
 ```
 ### Result
+|                          model_name                          | gqa   | textvqa | sqa   | vqav2 | MME     | MMB   | MM-VET |
+| :----------------------------------------------------------: | ----- | ------- | ----- | ----- | ------- | ----- | ------ |
+| [TinyLLaVA-1.5B](https://huggingface.co/bczhou/TinyLLaVA-1.5B) | 60.3  | 51.7    | 60.3  | 76.9  | 1276.5  | 55.2  | 25.8   |
+| [TinyLLaVA-0.55B](https://huggingface.co/jiajunlong/TinyLLaVA-0.89B) | 53.87 | 44.02   | 54.09 | 71.74 | 1118.75 | 37.8  | 20     |