mtensor
commited on
Commit
•
860995a
1
Parent(s):
03212a3
add examples to readme
Browse files
README.md
CHANGED
@@ -38,6 +38,64 @@ Though not the focus of this model, we did evaluate it on standard image underst
|
|
38 |
| COCO Captions | 141 | 138 | n/a | n/a | 149 | 135 | 138 |
|
39 |
| AI2D | 64.5 | 73.7 | n/a | 62.3 | 81.2 | n/a | n/a |
|
40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
## Uses
|
42 |
|
43 |
### Direct Use
|
|
|
38 |
| COCO Captions | 141 | 138 | n/a | n/a | 149 | 135 | 138 |
|
39 |
| AI2D | 64.5 | 73.7 | n/a | 62.3 | 81.2 | n/a | n/a |
|
40 |
|
41 |
+
## How to Use
|
42 |
+
|
43 |
+
You can load the model and perform inference as follows:
|
44 |
+
```python
|
45 |
+
from transformers import FuyuForCausalLM, AutoTokenizer, FuyuProcessor, FuyuImageProcessor
|
46 |
+
from PIL import Image
|
47 |
+
|
48 |
+
# load model, tokenizer, and processor
|
49 |
+
pretrained_path = "adept/fuyu-8b"
|
50 |
+
tokenizer = AutoTokenizer.from_pretrained(pretrained_path)
|
51 |
+
|
52 |
+
image_processor = FuyuImageProcessor()
|
53 |
+
processor = FuyuProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
54 |
+
|
55 |
+
model = FuyuForCausalLM.from_pretrained(pretrained_path, device_map="cuda:0")
|
56 |
+
|
57 |
+
# test inference
|
58 |
+
text_prompt = "Generate a coco-style caption.\n"
|
59 |
+
image_path = "bus.png" # https://huggingface.co/adept-hf-collab/fuyu-8b/blob/main/bus.png
|
60 |
+
image_pil = Image.open(image_path)
|
61 |
+
|
62 |
+
model_inputs = processor(text=text_prompt, images=[image_pil], device="cuda:0")
|
63 |
+
for k, v in model_inputs.items():
|
64 |
+
model_inputs[k] = v.to("cuda:0")
|
65 |
+
|
66 |
+
generation_output = model.generate(**model_inputs, max_new_tokens=8)
|
67 |
+
generation_text = processor.batch_decode(generation_output, skip_special_tokens=True)[0][-38:]
|
68 |
+
assert generation_text == "A bus parked on the side of a road.<s>"
|
69 |
+
```
|
70 |
+
|
71 |
+
Fuyu can also perform some question answering on natural images:
|
72 |
+
```python
|
73 |
+
text_prompt = "What color is the bus?\n"
|
74 |
+
image_path = "/bus.png" # https://huggingface.co/adept-hf-collab/fuyu-8b/blob/main/bus.png
|
75 |
+
image_pil = Image.open(image_path)
|
76 |
+
|
77 |
+
model_inputs = processor(text=text_prompt, images=[image_pil], device="cuda:0")
|
78 |
+
for k, v in model_inputs.items():
|
79 |
+
model_inputs[k] = v.to("cuda:0")
|
80 |
+
|
81 |
+
generation_output = model.generate(**model_inputs, max_new_tokens=6)
|
82 |
+
generation_text = processor.batch_decode(generation_output, skip_special_tokens=True)[0][-17:]
|
83 |
+
assert generation_text == "The bus is blue.\n"
|
84 |
+
|
85 |
+
|
86 |
+
text_prompt = "What is the highest life expectancy at birth of male?\n"
|
87 |
+
image_path = "chart.png" # https://huggingface.co/adept-hf-collab/fuyu-8b/blob/main/chart.png
|
88 |
+
image_pil = Image.open(image_path)
|
89 |
+
|
90 |
+
model_inputs = processor(text=text_prompt, images=[image_pil], device="cuda:0")
|
91 |
+
for k, v in model_inputs.items():
|
92 |
+
model_inputs[k] = v.to("cuda:0")
|
93 |
+
|
94 |
+
generation_output = model.generate(**model_inputs, max_new_tokens=16)
|
95 |
+
generation_text = processor.batch_decode(generation_output, skip_special_tokens=True)[0][-55:]
|
96 |
+
assert generation_text == "The life expectancy at birth of males in 2018 is 80.7.\n"
|
97 |
+
```
|
98 |
+
|
99 |
## Uses
|
100 |
|
101 |
### Direct Use
|