recourses needed to run this efficiently?

by Daemontatox - opened 3 days ago

3 days ago

I can run the base Llama3.2 11b on single Nvidia L4 (24 vram) , whats the recommended amount to run this model efficiently?

Xkev

Owner 2 days ago

I think you can run this model using the same resource as the base model. Feel free to contact me if you encounter any issues!

nudelbrot

1 day ago

for anyone trying this out, running on L4 is very slow (<10 tok/ sec). The speed on the HF space is much faster.

Daemontatox

1 day ago

@nudelbrot ofcourse the one on huggingface is faster its rotating A100.

Daemontatox

1 day ago

@Xkev thank you , i was making , because when i was trying to host it on huggingface, it kept saying L4 is too low of a memory and i thought the COT requires more VRAM.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment