Could not run on Colab

#15
by MosheBeeri - opened

It just consumes all available memory
51 G Ram, and 16 G GPU RAM
Any idea besides using machine with more RAM?

Of course, even 13b model need v100_32g to run, the 40b model must need more!

Technology Innovation Institute org

Model weights alone are ~= 80GB, so fast inference would require at least 90-100GB.
You can try to see if you can get accelerate with cpu offloading to work: https://huggingface.co/docs/accelerate/package_reference/big_modeling

The community has also created a 4bit quantised version of the model: https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ, which should only require 20GB for the model weights.

Otherwise the best bet would be to work with the smaller models: https://huggingface.co/tiiuae/falcon-7b

FalconLLM changed discussion status to closed

Sign up or log in to comment