merve HF staff commited on
Commit
5d2b324
1 Parent(s): 4d9f0ec

Add prompting

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -82,6 +82,26 @@ inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)
82
  output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
83
  print(processor.decode(output[0][2:], skip_special_tokens=True))
84
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
  ### Model optimization
87
 
 
82
  output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
83
  print(processor.decode(output[0][2:], skip_special_tokens=True))
84
  ```
85
+ When prompting with videos/3D/multi-view input, prompt like following:
86
+
87
+ ```python
88
+ # if you downsampled n frames from the input
89
+
90
+ image_tokens = "<image>" * n
91
+ prompt = f"<|im_start|>user {image_tokens}\nWhat are these?|im_end|><|im_start|>assistant"
92
+ ```
93
+
94
+ When prompting with interleaved images and videos, prompt like following:
95
+
96
+ ```python
97
+ # two interleaved images
98
+ prompt = "<|im_start|>user <image><image>\nWhat are these?|im_end|><|im_start|>assistant"
99
+
100
+ # two interleaved videos, if you downsampled n frames in total from both videos
101
+ image_tokens = "<image>" * n
102
+ prompt = f"<|im_start|>user {image_tokens}\nWhat are these?|im_end|><|im_start|>assistant"
103
+ ```
104
+
105
 
106
  ### Model optimization
107