I know, I know, 6.5bpw is enough for perfection. For some people.

But, for those who want the best they can load, here's an 8bpw quant. Makes a difference for me, I think.

tweaked exl2 quant parameters a bit because I run 6-8k contexts:

python3 convert.py -i ../models/Smaug-Mixtral_v0.1 -o smaug_mixtral -cf smaug_mixtral -l 4096 -b 8 -hb 8 -ss 4096

Downloads last month: 3

Inference Examples

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.