Much love for the train.
#1
by
deleted
- opened
Hey, quick follow up, if you get a chance to do a Windows compatible GPTQ quant for this, that'd really help me with prompt testing. I think it's probably good enough to bother with.
Just leave out --act-order.
CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/PathToVicuna --true-sequential --wbits 4 --groupsize 128 --save_safetensors vicuna-13b-free-4bit-128g.safetensors
I think that's the command (someone feel free to correct me, I don't feel like checking). Only takes like an hour.
deleted
changed discussion status to
closed