Special tokens for Visual Grounding?

#77
by echooooooooo - opened

In the paper, I could find "We insert normalized (xmin, ymin, xmax, ymax) coordinates directly into the text, demarcated by special tokens." However, there's no special tokens for grounding in tokenizer_config.json. Can you give me some example to use Llama for visual grounding tasks?

Sign up or log in to comment