Spaces:
Running
Running
Hardware Requirements
#1
by
Lightchain
- opened
Very interesting model. Does anyone have info on what hardware is required to run it?
You will need ~80GB of memory for inference at 16bit. Half that for 8bit, and a quarter that for 4bit.
I just ran 16bit on an A100 SXM w/ 80GB of vram
With llama.cpp this model with Q4_K_M quantization and 15000 context size fits on a single RTX 3090 or 4090 (24GB VRAM). Its performance doesn't seem to be affected much - at least based on my limited testing on a set of 50 reasoning puzzles.
You can also run it on cpu if you have 32gb ram.