Broken tokenizer?
It spits out numbers and repeats text sometimes. Not very good.
Which format.. what hardware are you using?
Q5KS, tried running it on llama.cpp and kobold. RTX3080, intel, not sure how it's relevant. How did you get it to quantize? Which scripts did you use? I can't quantize it with default settings in llama.cpp.
Your GPU's probably too small, I'd suggest a smaller model.
I'm not running it on GPU, I'm running it on CPU with CuBLAS processing. If I had memory problems I wouldn't be able to run it at all. Just tell me how you got it quantized.
Ah I see you have 70B models. Just check the README
Oh... It has BPE vocab... That's why it didn't convert. I converted it on my own machine and the issue seems to persist. Must have been inherited from MoMo: https://huggingface.co/moreh/MoMo-72B-lora-1.8.6-DPO/discussions/7