I know, I know, 6.5bpw is enough for perfection. For some people.
But, for those who want the best they can load, here's an 8bpw quant. Makes a difference for me, I think.
tweaked exl2 quant parameters a bit because I run 6-8k contexts:
python3 convert.py -i ../models/Smaug-Mixtral_v0.1 -o smaug_mixtral -cf smaug_mixtral -l 4096 -b 8 -hb 8 -ss 4096
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.