How could we load the model with low gpu memory?
#4
by
erjiaxiao
- opened
My GPU memory is 24GB, which is not enough for the model. How could we load the model with low GPU memory?
Hi,
You can pass a quantization_config to the from_pretrained
method in order for it to load in fewer bytes (like 4 bit or 8 bit):
from transformers import BitsAndBytesConfig, InstructBlipForConditionalGeneration
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
model = InstructBlipForConditionalGeneration.from_pretrained("Salesforce/instructblip-vicuna-13b", device_map="auto", quantization_config=quantization_config)
Refer to the blog post for details: https://huggingface.co/blog/4bit-transformers-bitsandbytes
Thank you so much for your help!