Is finetuning supported for this model? If so can people give me some pointers?
Is finetuning supported for this model? If so can people give me some pointers?
I found this post: https://discuss.huggingface.co/t/finetune-blip-on-customer-dataset-20893/28446
but this is about Salesforce/blip-vqa-base
instead of this model Salesforce/blip-image-captioning-large
Hi
@husjerry
Fine-tuning the VQA should be done in the same way than the image captioning fine-tuning, I think the only difference is on the way you prompt the model (but I am not sure).
You can have a look at the instructions shared on that thread or refer to the original vqa fine-tuning script:; https://github.com/salesforce/BLIP/blob/main/train_vqa.py and try to use the HF model or use their model, then convert it to HF version using the conversion script here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/blip/convert_blip_original_pytorch_to_hf.py