Using llama.cpp server, responses always end with <|im_end|>

by gilankpam - opened Apr 23

Apr 23

Hi team,

I run the model with llama.cpp server, this is the command

./server  -m models/codeqwen-1_5-7b-chat-q8_0.gguf -c 65536 --host "0.0.0.0" --port "8080" --n-gpu-layers 256

I always get <|im_end|> at the end of response. This is sample output

User: Hi

Llama: Hi! How can I help you today?<|im_end|>

User: who are you?

Llama: My name is Llama, I am a large language model created by Alibaba Cloud.<|im_end|>

Am I missing something?

JustinLin610

Qwen org Apr 24

No this is not the right way to use the model. You need to use ChatML and you'd better use our system prompt. Check this command:

./main -m qwen1_5-7b-chat-q5_k_m.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt

https://qwen.readthedocs.io/en/latest/run_locally/llama.cpp.html this is a simple doc for the reference.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment