Q6_K Output is just exclamation marks, e.g., !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

#5
by paolovic - opened

Hi,

although I am sticking to the prompt template, my output contains only exclamation marks:
I am using Llama-3.1-Nemotron-70B-Instruct-HF-Q6_K.gguf, I merged the files using llama.cpp to be compatible for vllm like this

./llama-gguf-split --merge /models/Llama-3.1-Nemotron-70B-Instruct-HF-GGUF/Llama-3.1-Nemotron-70B-Instruct-HF-Q6_K/Llama-3.1-Nemotron-70B-Instruct-HF-Q6_K-00001-of-00002.gguf Llama-3.1-Nemotron-70B-Instruct-HF-Q6_K.gguf

This is my input

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

you are helpful ai assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

how many r in strawberry?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The model's output is

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Any advice?

Thank you in advance
Best regards

Hmm everything looks reasonable.. feels like a rope setting is off or the prompt isn't going through properly.. is there a way to show the full input/output of the model to make sure you're not accidentally giving the system prompt twice or something silly?

Hi @bartowski ,
sorry for let you waiting, yes I'll provide it asap.
Best regards

Hi @bartowski ,

I logged it out

INFO 2024-10-24 11:38:20,276 LLaMA 3.1 70B vllmAPI ipru2wih 123456 llama70b/chat/completions/ vllm.py:274 - prompt: <|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful AI assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

give me something else than exclamation marks<|eot_id|><|start_header_id|>assistant<|end_header_id|>


INFO 2024-10-24 11:38:20,631 LLaMA 3.1 70B vllmAPI ipru2wih 123456 llama70b/chat/completions/ vllm.py:193 - !
INFO 2024-10-24 11:38:20,785 LLaMA 3.1 70B vllmAPI ipru2wih 123456 llama70b/chat/completions/ vllm.py:193 - !
INFO 2024-10-24 11:38:21,068 LLaMA 3.1 70B vllmAPI ipru2wih 123456 llama70b/chat/completions/ vllm.py:193 - !
INFO 2024-10-24 11:38:21,124 LLaMA 3.1 70B vllmAPI ipru2wih 123456 llama70b/chat/completions/ vllm.py:193 - !
INFO 2024-10-24 11:38:21,180 LLaMA 3.1 70B vllmAPI ipru2wih 123456 llama70b/chat/completions/ vllm.py:193 - !
INFO 2024-10-24 11:38:21,236 LLaMA 3.1 70B vllmAPI ipru2wih 123456 llama70b/chat/completions/ vllm.py:193 - !
INFO 2024-10-24 11:38:21,292 LLaMA 3.1 70B vllmAPI ipru2wih 123456 llama70b/chat/completions/ vllm.py:193 - !
INFO 2024-10-24 11:38:21,347 LLaMA 3.1 70B vllmAPI ipru2wih 123456 llama70b/chat/completions/ vllm.py:193 - !
INFO 2024-10-24 11:38:21,403 LLaMA 3.1 70B vllmAPI ipru2wih 123456 llama70b/chat/completions/ vllm.py:193 - !
INFO 2024-10-24 11:38:21,459 LLaMA 3.1 70B vllmAPI ipru2wih 123456 llama70b/chat/completions/ vllm.py:193 - !
...
<4000 lines of "!">

fascinating.. is there any debug info on the VLLM side? seems likely it's a VLLM issue unfortunately.. any chance you can try without the prompt formatting, in case VLLM is applying it on its own?

Like maybe, pure speculation, giving it the entire prompt like that is translating to:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful AI assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful AI assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

give me something else than exclamation marks<|eot_id|><|start_header_id|>assistant<|end_header_id|><|eot_id|><|start_header_id|>assistant<|end_header_id|>

which is making it act up

I'm downloading locally to double check in llama.cpp but i assume it's working there since no one else has reported any issues

I could try, but FYI Llama-3.1-Nemotron-70B-Instruct-HF-Q5_K_S.gguf works.
Same chat template, same everything, nothing changed but the path to the model

Oh that's very interesting then.. almost like the merge borked it.. I tried a smaller model, i'll try Q6 specifically in case that one alone is messed up, then i'll merge it and try again to see if it's VLLM

Finally tried it out with the merge, and it still produces perfectly coherently, so I guess it's not Q6 and not the merge process :(

Actually, can you give the sha256sum of your merge? mine is b9ca98d65c7ae0717bbe9e93b9408b0b4d64a856046d5c828fa6237e3d12cd6e

Hi @bartowski ,
thanks for your efforts
Mine is a4439e189c6c4107792c8dec0713c8c7f90df00fe227dba6ce4b9dfdb0d9b035
Seems like, although the merge went smoothly, that my file is corrupted.
I wonder how this could happen.
But anyway, thank you very much!

Wanna merging it again? Are you able to try with llama.cpp ?

I don't want to but I'll do it anyways :D
Download takes me some hours, I'll keep you updated tomorrow

Maybe check the sha's of the pre-merge as well?

7e6bb209e12eadcc680e2339aebd0ad7784b92508fdf5a66bda6d15aa3a2be2e *Llama-3.1-Nemotron-70B-Instruct-HF-Q6_K-00001-of-00002.gguf

okay that part looks proper.. part 2?

1e42c3eef10e0b51579774abb91f45f20255fbccbfe55494d466b691e59ba402 *Llama-3.1-Nemotron-70B-Instruct-HF-Q6_K-00002-of-00002.gguf

also proper :') make sure you don't delete them after merging in case it somehow goes wrong again

thank you @bartowski !

Problem solved, my file was corrupted

paolovic changed discussion status to closed

Sign up or log in to comment