Using llama.cpp server, responses always end with <|im_end|>
#2
by
gilankpam
- opened
Hi team,
I run the model with llama.cpp server, this is the command
./server -m models/codeqwen-1_5-7b-chat-q8_0.gguf -c 65536 --host "0.0.0.0" --port "8080" --n-gpu-layers 256
I always get <|im_end|>
at the end of response. This is sample output
User: Hi
Llama: Hi! How can I help you today?<|im_end|>
User: who are you?
Llama: My name is Llama, I am a large language model created by Alibaba Cloud.<|im_end|>
Am I missing something?
No this is not the right way to use the model. You need to use ChatML and you'd better use our system prompt. Check this command:
./main -m qwen1_5-7b-chat-q5_k_m.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt
https://qwen.readthedocs.io/en/latest/run_locally/llama.cpp.html this is a simple doc for the reference.