README.md · tsumeone/llama-30b-supercot-4bit-cuda at 0aa472af49dfa7ba88b50d03e7fb8d45c2f92a56

Quantized version of this: https://huggingface.co/ausboss/llama-30b-supercot

GPTQ quantization using https://github.com/0cc4m/GPTQ-for-LLaMa for compatibility with 0cc4m's fork of KoboldAI

This one is without groupsize to save on VRAM, so that you can enjoy the full 2048 max context if you have 24GB VRAM (or at least get a lot closer to it versus the groupsize version)

Command used to quantize:
python llama.py c:\llama-30b-supercot c4 --wbits 4 --act-order --true-sequential --save_safetensors 4bit.safetensors

Eval	Score
WikiText2	4.66
PTB	17.64
C4	6.50