Would you consider EXL2?
I'd love to give this a shot but I've been severely TabbyAPI-pilled recently and honestly I can't go back to using ooba for my text completion API. I might consider Kobold but going back and forth between them sounds like a drag.
With that said, if you have the spare time/resources, I'd really appreciate an EXL2 version.
On my RTX 4090 I can do 8x7Bs at 3.5bpw with full context, provided I enable 8-bit cache, if that helps at all.
No worries if you can't swing it, of course. Just figured I'd put it out there.
I'd love to give this a shot but I've been severely TabbyAPI-pilled recently and honestly I can't go back to using ooba for my text completion API. I might consider Kobold but going back and forth between them sounds like a drag.
With that said, if you have the spare time/resources, I'd really appreciate an EXL2 version.
On my RTX 4090 I can do 8x7Bs at 3.5bpw with full context, provided I enable 8-bit cache, if that helps at all.
No worries if you can't swing it, of course. Just figured I'd put it out there.
Actually, this is one model that I was very excited and happy about. However, when trying to push it to GGUF or run it on BF16 apparently the 9B mistral trix model just simply does not work with mergekit MoE.
That being said, if you have an idea for an MoE model that you would want in EXL2 I will merge it together then talk to LoneStriker about converting to 3.5 BPW for you
ahhh I see!!
i'll keep that in mind then!