Salesforce/instructblip-vicuna-13b · How could we load the model with low gpu memory?

erjiaxiao

Apr 16

My GPU memory is 24GB, which is not enough for the model. How could we load the model with low GPU memory?

nielsr

Apr 16

•

edited Apr 16

Hi,

You can pass a quantization_config to the from_pretrained method in order for it to load in fewer bytes (like 4 bit or 8 bit):

from transformers import BitsAndBytesConfig, InstructBlipForConditionalGeneration

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

model = InstructBlipForConditionalGeneration.from_pretrained("Salesforce/instructblip-vicuna-13b", device_map="auto", quantization_config=quantization_config)

Refer to the blog post for details: https://huggingface.co/blog/4bit-transformers-bitsandbytes

erjiaxiao

Apr 16

Thank you so much for your help!