Hardware Requirements

#1
by Lightchain - opened

Very interesting model. Does anyone have info on what hardware is required to run it?

You will need ~80GB of memory for inference at 16bit. Half that for 8bit, and a quarter that for 4bit.

I just ran 16bit on an A100 SXM w/ 80GB of vram

With llama.cpp this model with Q4_K_M quantization and 15000 context size fits on a single RTX 3090 or 4090 (24GB VRAM). Its performance doesn't seem to be affected much - at least based on my limited testing on a set of 50 reasoning puzzles.

You can also run it on cpu if you have 32gb ram.

Sign up or log in to comment