Can you quantify the model?
#1
by
xldistance
- opened
Quants uploading here: https://huggingface.co/models?search=LoneStriker%20Smaugv0.1
Thank you very much.
Thanks!
And what about AWQ or GPTQ ?
And what about AWQ or GPTQ ?
I mostly do exl2 and sometimes GGUF quants (I started out doing quants that I was using myself and it's grown a bit since then.) You'll have to wait for TheBloke to get the GPTQ and AWQ quants as I have not setup those pipelines myself. exl2 quants tend to be the fastest ones for inference if you have a GPU where the quantized model will fit. GGUF quants are the most compatible across the widest ranges of devices.