Is vision tower correctly loaded?

#4
by OBJECT-907 - opened

I found the weights of the vision tower in model.safetensors, but it seems to load the weights from "google/siglip-so400m-patch14-384"? (at least it force me to download)

I found the weights of the vision tower in model.safetensors, but it seems to load the weights from "google/siglip-so400m-patch14-384"? (at least it force me to download)

Hi, have you figured this out? I think other models like VideoLLaVA also does this, but the visual encoder is not finetuned so it doesn't matter. But I remember llava-onevision has finetuned the visual encoder?

This comment has been hidden

I found the weights of the vision tower in model.safetensors, but it seems to load the weights from "google/siglip-so400m-patch14-384"? (at least it force me to download)

Hi, have you figured this out? I think other models like VideoLLaVA also does this, but the visual encoder is not finetuned so it doesn't matter. But I remember llava-onevision has finetuned the visual encoder?

As far as I understand, the model will first load the original weight when instantiating the vision tower, then reload the fine-tuned weights. So it is correctly loaded.

I found the weights of the vision tower in model.safetensors, but it seems to load the weights from "google/siglip-so400m-patch14-384"? (at least it force me to download)

Hi, have you figured this out? I think other models like VideoLLaVA also does this, but the visual encoder is not finetuned so it doesn't matter. But I remember llava-onevision has finetuned the visual encoder?

As far as I understand, the model will first load the original weight when instantiating the vision tower, then reload the fine-tuned weights. So it is correctly loaded.

OK, thanks a lot for your information!

Sign up or log in to comment