How to inference it on a 40 GB A100 and 80 GB Ram of Colab PRO?
#17
by
SadeghPouriyan
- opened
I want to use this model on colab pro and i have 40 bg a100 and 80 gb ram of the runtime. what is the best practise to use it on this system?
try our https://github.com/microsoft/VPTQ project, and you can run this https://colab.research.google.com/github/microsoft/VPTQ/blob/main/notebooks/vptq_example.ipynb with 4bit model https://huggingface.co/VPTQ-community/Llama-3.1-Nemotron-70B-Instruct-HF-v8-k65536-65536-woft .