GGUF Quantisations
#3
by
smcleod
- opened
I've had a crack (for the first time) at quantising models with Q3_K_M and Q4_K_M GGUF variants if anyone finds them useful, I've also pushed these to Ollama's model registry.
Disclaimer - I literally read how to quantise models yesterday so while I think it went to plan - please do let me know if there are any issues!
It doesn't seem to be working. The output seems to be garbage. Did it work for you?
I added some more quants: https://huggingface.co/gobean/Smaug-Mixtral-v0.1-GGUF, using llama.cpp from 4/18/2024.
Mixtral Instruct worked better for me with qx_0 for me so I used those - unsure why qx_k_y behave differently. Output tests on few shot seem good, q4_0 is fast enough for regular use with ~18gb vram usage.
smcleod
changed discussion status to
closed