jiajunlong
commited on
Commit
•
8320341
1
Parent(s):
801dede
Update README.md
Browse files
README.md
CHANGED
@@ -4,30 +4,22 @@ pipeline_tag: image-text-to-text
|
|
4 |
---
|
5 |
### TinyLLaVA
|
6 |
|
7 |
-
We
|
8 |
-
|
9 |
-
See the list below for the details of each model:
|
10 |
-
|
11 |
-
- [TinyLLaVA-0.55B](https://huggingface.co/jiajunlong/TinyLLaVA-0.55B)
|
12 |
-
- [TinyLLaVA-0.89B](https://huggingface.co/jiajunlong/TinyLLaVA-0.89B)
|
13 |
|
14 |
### Usage
|
15 |
|
16 |
-
1. you
|
17 |
2. running the following command:
|
18 |
-
|
19 |
```bash
|
20 |
python generate_model --model jiajunlong/TinyLLaVA-0.89B --prompt 'you want to ask' --image '/path/to/related/image'
|
21 |
```
|
22 |
-
|
23 |
or execute the following test code:
|
24 |
-
|
25 |
```python
|
26 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
27 |
from generate_model import *
|
28 |
-
model = AutoModelForCausalLM.from_pretrained("jiajunlong/TinyLLaVA-0.
|
29 |
config = model.config
|
30 |
-
tokenizer = AutoTokenizer.from_pretrained("jiajunlong/TinyLLaVA-0.
|
31 |
prompt="you want to ask"
|
32 |
image="/path/to/related/image"
|
33 |
output_text, genertaion_time = generate(prompt=prompt, image=image, model=model, tokenizer=tokenizer)
|
@@ -43,12 +35,11 @@ print_txt = (
|
|
43 |
)
|
44 |
print(print_txt)
|
45 |
```
|
46 |
-
|
47 |
### Result
|
48 |
|
49 |
-
| model_name | gqa | textvqa | sqa | vqav2 | MME | MMB | MM-VET |
|
50 |
-
| :----------------------------------------------------------: | ----- | ------- | ----- | ----- | ------- | ----- | ------ |
|
51 |
-
| [TinyLLaVA-1.5B](https://huggingface.co/bczhou/TinyLLaVA-1.5B) | 60.3 | 51.7 | 60.3 | 76.9 | 1276.5 | 55.2 | 25.8 |
|
52 |
-
| [TinyLLaVA-0.55B](https://huggingface.co/jiajunlong/TinyLLaVA-0.
|
53 |
-
|
54 |
|
|
|
4 |
---
|
5 |
### TinyLLaVA
|
6 |
|
7 |
+
We trained 1 model with fewer than 1B parameters using the TinyLLaVA approach, employing the same training settings as [TinyLLaVA](https://github.com/DLCV-BUAA/TinyLLaVABench). For the Language and Vision models, we chose [OpenELM-450M-Instruct](apple/OpenELM-450M-Instruct) and [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384), respectively. The Connector was configured with a 2-layer MLP. The dataset used for training is the save as [LLaVA](https://github.com/haotian-liu/LLaVA). During testing, we found that [TinyLLaVA-0.55B](https://huggingface.co/jiajunlong/TinyLLaVA-0.55B) exhibited significantly faster inference speed on CPU compared to [TinyLLaVA-1.5B](https://huggingface.co/bczhou/TinyLLaVA-1.5B)
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
### Usage
|
10 |
|
11 |
+
1. you need to download the generate file "generate_model.py".
|
12 |
2. running the following command:
|
|
|
13 |
```bash
|
14 |
python generate_model --model jiajunlong/TinyLLaVA-0.89B --prompt 'you want to ask' --image '/path/to/related/image'
|
15 |
```
|
|
|
16 |
or execute the following test code:
|
|
|
17 |
```python
|
18 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
19 |
from generate_model import *
|
20 |
+
model = AutoModelForCausalLM.from_pretrained("jiajunlong/TinyLLaVA-0.55B", trust_remote_code=True)
|
21 |
config = model.config
|
22 |
+
tokenizer = AutoTokenizer.from_pretrained("jiajunlong/TinyLLaVA-0.55B", use_fast=False, model_max_length = config.tokenizer_model_max_length,padding_side = config.tokenizer_padding_side)
|
23 |
prompt="you want to ask"
|
24 |
image="/path/to/related/image"
|
25 |
output_text, genertaion_time = generate(prompt=prompt, image=image, model=model, tokenizer=tokenizer)
|
|
|
35 |
)
|
36 |
print(print_txt)
|
37 |
```
|
|
|
38 |
### Result
|
39 |
|
40 |
+
| model_name | gqa | textvqa | sqa | vqav2 | MME | MMB | MM-VET |
|
41 |
+
| :----------------------------------------------------------: | ----- | ------- | ----- | ----- | ------- | ----- | ------ |
|
42 |
+
| [TinyLLaVA-1.5B](https://huggingface.co/bczhou/TinyLLaVA-1.5B) | 60.3 | 51.7 | 60.3 | 76.9 | 1276.5 | 55.2 | 25.8 |
|
43 |
+
| [TinyLLaVA-0.55B](https://huggingface.co/jiajunlong/TinyLLaVA-0.89B) | 53.87 | 44.02 | 54.09 | 71.74 | 1118.75 | 37.8 | 20 |
|
44 |
+
|
45 |
|