Is vision tower correctly loaded?

by OBJECT-907 - opened Sep 4

Sep 4

I found the weights of the vision tower in model.safetensors, but it seems to load the weights from "google/siglip-so400m-patch14-384"? (at least it force me to download)

county

Sep 18

I found the weights of the vision tower in model.safetensors, but it seems to load the weights from "google/siglip-so400m-patch14-384"? (at least it force me to download)

Hi, have you figured this out？ I think other models like VideoLLaVA also does this, but the visual encoder is not finetuned so it doesn't matter. But I remember llava-onevision has finetuned the visual encoder?

OBJECT-907

Sep 18

This comment has been hidden

OBJECT-907

Sep 18

I found the weights of the vision tower in model.safetensors, but it seems to load the weights from "google/siglip-so400m-patch14-384"? (at least it force me to download)

Hi, have you figured this out？ I think other models like VideoLLaVA also does this, but the visual encoder is not finetuned so it doesn't matter. But I remember llava-onevision has finetuned the visual encoder?

As far as I understand, the model will first load the original weight when instantiating the vision tower, then reload the fine-tuned weights. So it is correctly loaded.

county

Sep 18

I found the weights of the vision tower in model.safetensors, but it seems to load the weights from "google/siglip-so400m-patch14-384"? (at least it force me to download)

Hi, have you figured this out？ I think other models like VideoLLaVA also does this, but the visual encoder is not finetuned so it doesn't matter. But I remember llava-onevision has finetuned the visual encoder?

As far as I understand, the model will first load the original weight when instantiating the vision tower, then reload the fine-tuned weights. So it is correctly loaded.

OK, thanks a lot for your information!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment