bigscience/bloom · How much GPU memory needed?

Sep 14, 2022

Hello HuggingFace community,

I am trying to test the Bloom model on an AWS with 'NVIDIA A10G ' GPU which has 22GB memory.
I did run this code:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom")
model = AutoModel.from_pretrained("bigscience/bloom")

It automatically downloaded the Bloom model (72 files). But after that, I do get aCUDA out of memory.

Can someone tell me how much of GPU memory is needed to run the Bloom model?

Thanks

pai4451

Sep 14, 2022

The 72 checkpoints are 329GB in total, so far inference it might take about 350GB.

justheuristic

BigScience Workshop org Sep 16, 2022

Works in ~200GB if you use load_in_8bit feature from https://github.com/huggingface/transformers/pull/17901

mayank-mishra

Sep 27, 2022

@mazib you will need at least 8 x A100 80 GB GPUs for inference in fp16.
Or you can use int8 for inference.

bver

Sep 28, 2022

Thanks for this discussion thread.
I have some (hopefully related) observations here: https://huggingface.co/bigscience/bloom/discussions/118