Text Generation
Transformers
PyTorch
TensorBoard
Safetensors
bloom
Eval Results
text-generation-inference
Inference Endpoints

How much GPU memory needed?

#109
by mazib - opened

Hello HuggingFace community,

I am trying to test the Bloom model on an AWS with 'NVIDIA A10G ' GPU which has 22GB memory.
I did run this code:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom")
model = AutoModel.from_pretrained("bigscience/bloom")

It automatically downloaded the Bloom model (72 files). But after that, I do get aCUDA out of memory.

Can someone tell me how much of GPU memory is needed to run the Bloom model?

Thanks

The 72 checkpoints are 329GB in total, so far inference it might take about 350GB.

BigScience Workshop org

Works in ~200GB if you use load_in_8bit feature from https://github.com/huggingface/transformers/pull/17901

@mazib you will need at least 8 x A100 80 GB GPUs for inference in fp16.
Or you can use int8 for inference.

Thanks for this discussion thread.
I have some (hopefully related) observations here: https://huggingface.co/bigscience/bloom/discussions/118

Sign up or log in to comment