jiajunlong commited on
Commit
8320341
1 Parent(s): 801dede

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -18
README.md CHANGED
@@ -4,30 +4,22 @@ pipeline_tag: image-text-to-text
4
  ---
5
  ### TinyLLaVA
6
 
7
- We have released two multimodal large models smaller than 1B, and the inference speed of both models on CPU is very fast.
8
-
9
- See the list below for the details of each model:
10
-
11
- - [TinyLLaVA-0.55B](https://huggingface.co/jiajunlong/TinyLLaVA-0.55B)
12
- - [TinyLLaVA-0.89B](https://huggingface.co/jiajunlong/TinyLLaVA-0.89B)
13
 
14
  ### Usage
15
 
16
- 1. you can download the generate file "generate_model.py"
17
  2. running the following command:
18
-
19
  ```bash
20
  python generate_model --model jiajunlong/TinyLLaVA-0.89B --prompt 'you want to ask' --image '/path/to/related/image'
21
  ```
22
-
23
  or execute the following test code:
24
-
25
  ```python
26
  from transformers import AutoTokenizer, AutoModelForCausalLM
27
  from generate_model import *
28
- model = AutoModelForCausalLM.from_pretrained("jiajunlong/TinyLLaVA-0.89B", trust_remote_code=True)
29
  config = model.config
30
- tokenizer = AutoTokenizer.from_pretrained("jiajunlong/TinyLLaVA-0.89B", use_fast=False, model_max_length = config.tokenizer_model_max_length,padding_side = config.tokenizer_padding_side)
31
  prompt="you want to ask"
32
  image="/path/to/related/image"
33
  output_text, genertaion_time = generate(prompt=prompt, image=image, model=model, tokenizer=tokenizer)
@@ -43,12 +35,11 @@ print_txt = (
43
  )
44
  print(print_txt)
45
  ```
46
-
47
  ### Result
48
 
49
- | model_name | gqa | textvqa | sqa | vqav2 | MME | MMB | MM-VET | GPU | CPU |
50
- | :----------------------------------------------------------: | ----- | ------- | ----- | ----- | ------- | ----- | ------ | -------------- | ---------- |
51
- | [TinyLLaVA-1.5B](https://huggingface.co/bczhou/TinyLLaVA-1.5B) | 60.3 | 51.7 | 60.3 | 76.9 | 1276.5 | 55.2 | 25.8 | 11.9 it/s | 0.35 it/s |
52
- | [TinyLLaVA-0.55B](https://huggingface.co/jiajunlong/TinyLLaVA-0.55B) | 53.87 | 44.02 | 54.09 | 71.74 | 1118.75 | 37.8 | 20 | 11.0 it/s | 0.67 it/s |
53
- | [TinyLLaVA-0.89B](https://huggingface.co/jiajunlong/TinyLLaVA-0.89B) | 50.38 | 36.37 | 50.02 | 65.44 | 1056.69 | 26.29 | 15.4 | **14.35 it/s** | **2 it/s** |
54
 
 
4
  ---
5
  ### TinyLLaVA
6
 
7
+ We trained 1 model with fewer than 1B parameters using the TinyLLaVA approach, employing the same training settings as [TinyLLaVA](https://github.com/DLCV-BUAA/TinyLLaVABench). For the Language and Vision models, we chose [OpenELM-450M-Instruct](apple/OpenELM-450M-Instruct) and [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384), respectively. The Connector was configured with a 2-layer MLP. The dataset used for training is the save as [LLaVA](https://github.com/haotian-liu/LLaVA). During testing, we found that [TinyLLaVA-0.55B](https://huggingface.co/jiajunlong/TinyLLaVA-0.55B) exhibited significantly faster inference speed on CPU compared to [TinyLLaVA-1.5B](https://huggingface.co/bczhou/TinyLLaVA-1.5B)
 
 
 
 
 
8
 
9
  ### Usage
10
 
11
+ 1. you need to download the generate file "generate_model.py".
12
  2. running the following command:
 
13
  ```bash
14
  python generate_model --model jiajunlong/TinyLLaVA-0.89B --prompt 'you want to ask' --image '/path/to/related/image'
15
  ```
 
16
  or execute the following test code:
 
17
  ```python
18
  from transformers import AutoTokenizer, AutoModelForCausalLM
19
  from generate_model import *
20
+ model = AutoModelForCausalLM.from_pretrained("jiajunlong/TinyLLaVA-0.55B", trust_remote_code=True)
21
  config = model.config
22
+ tokenizer = AutoTokenizer.from_pretrained("jiajunlong/TinyLLaVA-0.55B", use_fast=False, model_max_length = config.tokenizer_model_max_length,padding_side = config.tokenizer_padding_side)
23
  prompt="you want to ask"
24
  image="/path/to/related/image"
25
  output_text, genertaion_time = generate(prompt=prompt, image=image, model=model, tokenizer=tokenizer)
 
35
  )
36
  print(print_txt)
37
  ```
 
38
  ### Result
39
 
40
+ | model_name | gqa | textvqa | sqa | vqav2 | MME | MMB | MM-VET |
41
+ | :----------------------------------------------------------: | ----- | ------- | ----- | ----- | ------- | ----- | ------ |
42
+ | [TinyLLaVA-1.5B](https://huggingface.co/bczhou/TinyLLaVA-1.5B) | 60.3 | 51.7 | 60.3 | 76.9 | 1276.5 | 55.2 | 25.8 |
43
+ | [TinyLLaVA-0.55B](https://huggingface.co/jiajunlong/TinyLLaVA-0.89B) | 53.87 | 44.02 | 54.09 | 71.74 | 1118.75 | 37.8 | 20 |
44
+
45