merve HF staff commited on
Commit
f58d665
1 Parent(s): 79207fe

Add interleaved and video/3d prompt

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -83,6 +83,26 @@ output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
83
  print(processor.decode(output[0][2:], skip_special_tokens=True))
84
  ```
85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  ### Model optimization
87
 
88
  #### 4-bit quantization through `bitsandbytes` library
 
83
  print(processor.decode(output[0][2:], skip_special_tokens=True))
84
  ```
85
 
86
+ When prompting with videos/3D data/multi-view data, prompt like following:
87
+
88
+ ```python
89
+ # if you downsampled n frames from the input
90
+
91
+ image_tokens = "<image>" * n
92
+ prompt = f"<|im_start|>user {image_tokens}\nWhat are these?|im_end|><|im_start|>assistant"
93
+ ```
94
+
95
+ When prompting with interleaved images and videos, prompt like following:
96
+
97
+ ```python
98
+ # two interleaved images
99
+ prompt = "<|im_start|>user <image><image>\nWhat are these?|im_end|><|im_start|>assistant"
100
+
101
+ # two interleaved videos, if you downsampled n frames in total from both videos
102
+ image_tokens = "<image>" * n
103
+ prompt = f"<|im_start|>user {image_tokens}\nWhat are these?|im_end|><|im_start|>assistant"
104
+ ```
105
+
106
  ### Model optimization
107
 
108
  #### 4-bit quantization through `bitsandbytes` library