How much GPU memory needed?
Hello HuggingFace community,
I am trying to test the Bloom model on an AWS with 'NVIDIA A10G ' GPU which has 22GB memory.
I did run this code:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom")
model = AutoModel.from_pretrained("bigscience/bloom")
It automatically downloaded the Bloom model (72 files). But after that, I do get aCUDA out of memory
.
Can someone tell me how much of GPU memory is needed to run the Bloom model?
Thanks
The 72 checkpoints are 329GB in total, so far inference it might take about 350GB.
Works in ~200GB if you use load_in_8bit feature from https://github.com/huggingface/transformers/pull/17901
@mazib
you will need at least 8 x A100 80 GB GPUs for inference in fp16.
Or you can use int8 for inference.
Thanks for this discussion thread.
I have some (hopefully related) observations here: https://huggingface.co/bigscience/bloom/discussions/118