Is vision tower correctly loaded?
I found the weights of the vision tower in model.safetensors, but it seems to load the weights from "google/siglip-so400m-patch14-384"? (at least it force me to download)
I found the weights of the vision tower in model.safetensors, but it seems to load the weights from "google/siglip-so400m-patch14-384"? (at least it force me to download)
Hi, have you figured this out? I think other models like VideoLLaVA also does this, but the visual encoder is not finetuned so it doesn't matter. But I remember llava-onevision has finetuned the visual encoder?
I found the weights of the vision tower in model.safetensors, but it seems to load the weights from "google/siglip-so400m-patch14-384"? (at least it force me to download)
Hi, have you figured this out? I think other models like VideoLLaVA also does this, but the visual encoder is not finetuned so it doesn't matter. But I remember llava-onevision has finetuned the visual encoder?
As far as I understand, the model will first load the original weight when instantiating the vision tower, then reload the fine-tuned weights. So it is correctly loaded.
I found the weights of the vision tower in model.safetensors, but it seems to load the weights from "google/siglip-so400m-patch14-384"? (at least it force me to download)
Hi, have you figured this out? I think other models like VideoLLaVA also does this, but the visual encoder is not finetuned so it doesn't matter. But I remember llava-onevision has finetuned the visual encoder?
As far as I understand, the model will first load the original weight when instantiating the vision tower, then reload the fine-tuned weights. So it is correctly loaded.
OK, thanks a lot for your information!