Alternative quantizatioons.
#7
by
ZeroWw
- opened
https://huggingface.co/ZeroWw/Meta-Llama-3-8B-Instruct-abliterated-v3-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5_k or q6_k.
Result: both f16.q6 and f16.q5 are smaller than q8_0 standard quantization and they perform as well as the pure f16.
@failspy hello! thanks for the abliterated versions.. can you please also do this one: https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-4194k
And, mistral instruct v03?