Broken tokenizer?

by ChuckMcSneed - opened Feb 7

Discussion

ChuckMcSneed

Feb 7

It spits out numbers and repeats text sometimes. Not very good.

senseable

Owner Feb 7

Which format.. what hardware are you using?

ChuckMcSneed

Feb 7

Q5KS, tried running it on llama.cpp and kobold. RTX3080, intel, not sure how it's relevant. How did you get it to quantize? Which scripts did you use? I can't quantize it with default settings in llama.cpp.

senseable

Owner Feb 7

Your GPU's probably too small, I'd suggest a smaller model.

ChuckMcSneed

Feb 7

I'm not running it on GPU, I'm running it on CPU with CuBLAS processing. If I had memory problems I wouldn't be able to run it at all. Just tell me how you got it quantized.

senseable

Owner Feb 7

Ah I see you have 70B models. Just check the README

ChuckMcSneed

Feb 8

Oh... It has BPE vocab... That's why it didn't convert. I converted it on my own machine and the issue seems to persist. Must have been inherited from MoMo: https://huggingface.co/moreh/MoMo-72B-lora-1.8.6-DPO/discussions/7

ChuckMcSneed changed discussion status to closed Feb 8

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment