--- license: other license_name: gemma-terms-of-use license_link: https://ai.google.dev/gemma/terms --- GGUF quants for https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1 > Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr 7B Gemma is the third model in the series, and is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b) that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). You can reproduce the training of this model via the recipe provided in the [Alignment Handbook](https://github.com/huggingface/alignment-handbook). There are few things to consider when using this model: * Special tokens `<|im_start|>` and `<|im_end|>` are not properly mapped as overrides of `` and `` (issue in the GGUF) * Repeat penalty must `1.0` (i.e. disabled) just like with the base model * The model was not trained with the system instructions (i.e. don't add the `system` instructions part of the chatml template) * Must stop on special token `` instead of `` otherwise the model goes on forever Here's a setup that seems to work quite well to chat with the model. The Q4_K is very fast and gives ~90 t/s on a 3090 full offloaded: ``` ./main -ins -r "" --color -e --in-prefix "user\n" --in-suffix "\nassistant\n" -c 0 --temp 0.7 --repeat-penalty 1.0 -ngl 29 -m ggml-zephyr-7b-gemma-v0.1-q4_k.gguf ``` | Layers | Context | [Template](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1/blob/19186e70e5679c47aaef473ae2fd56e20765088d/tokenizer_config.json#L59) | | --- | --- | --- | |

\<\|im_start\|\>user
{prompt}\<\|im_end\|\>
\<\|im_start\|\>assistant
{response}