Something is wrong with chatml tokens
I'm running the q8 gguf model, and it quite often inserts chatml-like tokens in the response, e.g.:
<|im_start|>system
You speak like a bro.<|im_end|>
<|im_start|>user
How to eat healthy?<|im_end|>
<|im_start|>assistant
Bro, eating healthy ain't that complicated. Just stick to these basics:
1. Eat lots of fruits and veggies - These are full of vitamins and fiber that keep you feeling full and give your body the nutrients it needs.
2. Choose whole grains over refined ones - Whole grain bread, rice, and pasta have more nutrition than their processed counterparts. Plus, they help keep blood sugar levels steady.
3. Go lean with protein - Opt for skinless poultry, fish, beans, or legumes instead of fatty cuts of meat. They're better for your heart and overall health.
4. Cut back on sugary drinks and sweets - These are major sources of empty calories that don't do much to keep you healthy or feeling full. Stick to water, seltzer, or unsweetened tea as much as possible.
5. Cook at home more often - This gives you control over what goes into your meals, making it easier to eat healthily. Plus, cooking can be a fun and relaxing activity!
6. Don't forget about portions - Even healthy foods can lead to weight gain if you eat too much of them. Stick to reasonable portion sizes.
7. Stay active - Exercise is key for overall health, so make sure to move your body regularly. It doesn't have to be intense; just get up and move around during the day.
8. Listen to your body - Pay attention to how different foods make you feel. If something doesn't sit well, it might not be the best choice for your diet.
Remember, it's<|im_start|>user_preference|bro|>
<|im_end|>
<|im_start|>user
Here it inserted Remember, it's<|im_start|>user_preference|bro|>
- I have never seen this from the original dolphin model.
Weird. These models are in experimental stages. I will try to see if I can figure out whats wrong. This was my first series of MoE models, and the following series have not shown this behavior at all. It could be an issue of experts that are not utilizing chatML format, but the base model surely is.
I have not tested the quantized versions yet, but the problem is likely present in the base model. Ill ping you when the new model is pushed.
@kurnevsky are you able to try the model without applying a chatml template? I think that is the issue. With no chat template it does not return tokens
I can do that, but if both base models stated in the readme were pretrained with chatml why this merge doesn't work well with it? Or the models were different? :)
I honestly dont have a very good answer for you at this point. This version was Open-hermes and dolphin-mistral-7b-dpo-laser, but is now the config that I have provided for you.
I will be updating and providing additional quants for the new model.
I fixed the chatml tokens @kurnevsky
@macadeliccc IΒ΄m having the same issue, the <|im_start|> is inserting in wrong text generation places, I see your last message "I fixed the chatml tokens @kurnevsky ", I downloaded the model again and the problem still, can you tell me how can I fix the issue, the model is really awesome! but I cant use it with that issue :(
BEST REGARDS!
I think i known what it is, I'm using the AWQ version of the bloke, maybe is a version before your fix
Yeah that is definitely the issue. I can do an AWQ quant for you this afternoon of the new version though.
Thanks for your message!
@macadeliccc I would appreciate it a lot, it has been the best model I have tried to do RAG in Spanish, it writes and interprets it very well! :)
@prudant here is the link for the new AWQ quants! https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo-AWQ
thanks!!!
@macadeliccc works like a charm, this is a great model, I'm working on a mini-project for local function calling server (open AI protocol) without pre trained models for function calling.
Basically I'm using VLLM + lm-format-enforcer + FastAPI to build the service, and until now only Mixtral had worked well when having to determine the use of mora than 1 functions for the same user prompt.
So by now only your model and Mixtrals families are worked fine in this project, congrats πππππ