Special tokens for Visual Grounding?
#77
by
echooooooooo
- opened
In the paper, I could find "We insert normalized (xmin, ymin, xmax, ymax) coordinates directly into the text, demarcated by special tokens." However, there's no special tokens for grounding in tokenizer_config.json
. Can you give me some example to use Llama for visual grounding tasks?