Special tokens for Visual Grounding?

#77

by echooooooooo - opened 6 days ago

6 days ago

In the paper, I could find "We insert normalized (xmin, ymin, xmax, ymax) coordinates directly into the text, demarcated by special tokens." However, there's no special tokens for grounding in tokenizer_config.json. Can you give me some example to use Llama for visual grounding tasks?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment