nvidia
/

Llama-3.1-Nemotron-70B-Instruct-HF

Text Generation

text-generation-inference

Model card Files Files and versions Community

How to inference it on a 40 GB A100 and 80 GB Ram of Colab PRO?

#17

by SadeghPouriyan - opened Oct 21

Oct 21

I want to use this model on colab pro and i have 40 bg a100 and 80 gb ram of the runtime. what is the best practise to use it on this system?

Oct 22

•

try our https://github.com/microsoft/VPTQ project, and you can run this https://colab.research.google.com/github/microsoft/VPTQ/blob/main/notebooks/vptq_example.ipynb with 4bit model https://huggingface.co/VPTQ-community/Llama-3.1-Nemotron-70B-Instruct-HF-v8-k65536-65536-woft .

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment