ChatML template does not work properly
The models hallucinates a lot. I am using the GGUF model and ML Studio, selected the ChatML prompt template based on Model Card
@WasamiKirua not sure about the quality of the quantized versions, I'd recommend that you load the model weights in bfloat16.
Here's a colab notebook with a chat interface, you can use this to interact with the chat model.
you are using chatml , use phi-2 template for chatting , the author mentioned that in the original model model card, https://huggingface.co/nvidia/Mistral-NeMo-Minitron-8B-Base/discussions/5#66cbf507ed5c5babdef42cd1
if you use the wrong chat template with gguf , the model hallucinates
Yeah, have to use the chat template supported by the model's tokenizer. In this case, it's chatml. It's the same template as in the model card.
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
How to explain Internet for a medieval knight?<|im_end|>
<|im_start|>assistant
Also make sure of the gguf you are using , lower quants tend to hallucinate alot , also set the temp and top p and top k to the right amounts
The SFT dataset only contained English instructions, can you try English prompts? This looks like Italian.
with English text is fine. The Minitron is a multilanguage, I dunno how much is the italian data in % but based on my experience the Mistral Nemo is able to generate Italian text pretty well. Perhaps, since you fine tuned it with Openhermes it is all about "catastrophic forget" ? but I am pretty sure OpenHermes has also Italian text. No idea honestly. Anyway did you use axolotl to fine tune ? if yes might I ask you the configuration yaml ?
The chat model is still able to generate Italian text. Catastrophic forgetting mainly occurs when you use full parameter finetuning. Since I used LoRA, there's little chance of that happening.
Here's an example:
Instruction: "Quali sono le destinazioni turistiche più apprezzate in Italia?"
Response: "Le destinazioni turistiche più apprezzate in Italia dipendono da una varietà di fattori, tra cui la storia, la cultura, la natura e la gastronomia. Alcune delle destinazioni turistiche più popolari in Italia includono: Roma, con la sua ricca storia e monumenti antichi; Venezia, famosa per le sue gondole e canali; Firenze, con il suo patrimonio artistico rinascimentale; Milano, con la sua architettura moderna e la moda; Napoli, con la sua storia e la sua cucina; e la Costa Amalfitana, con le sue spettacolari viste panoramiche. Ci sono anche molte altre destinazioni turistiche popolari in Italia, come Pompei, la Sardegna, la Sicilia, il Piemonte, la Toscana e la Liguria. In ogni regione di Italia, ci sono molte destinazioni turistiche uniche e interessanti da scoprire."
Same for me sadly, using imatrix quants, q6, the model always responds off the mark, it's being very dumb especially in E/RP...
Hi @Varkoyote , can you try the same prompts with the unquantized bfloat16 version?
Here's a colab notebook with a chat interface.
https://huggingface.co/rasyosef/Mistral-NeMo-Minitron-8B-Chat/blob/main/Mistral_NeMo_Minitron_8B_chatbot.ipynb