tiiuae/falcon-40b · Could not run on Colab

May 29, 2023

•

It just consumes all available memory
51 G Ram, and 16 G GPU RAM
Any idea besides using machine with more RAM?

yuuhan

May 29, 2023

Of course, even 13b model need v100_32g to run, the 40b model must need more!

Technology Innovation Institute org May 30, 2023

Model weights alone are ~= 80GB, so fast inference would require at least 90-100GB.
You can try to see if you can get accelerate with cpu offloading to work: https://huggingface.co/docs/accelerate/package_reference/big_modeling

The community has also created a 4bit quantised version of the model: https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ, which should only require 20GB for the model weights.

Otherwise the best bet would be to work with the smaller models: https://huggingface.co/tiiuae/falcon-7b

FalconLLM changed discussion status to closed May 30, 2023