dranger003 commited on
Commit
18b44ec
·
verified ·
1 Parent(s): 076a183

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -7,6 +7,17 @@ GGUF quants for https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1
7
 
8
  > Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr 7B Gemma is the third model in the series, and is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b) that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). You can reproduce the training of this model via the recipe provided in the [Alignment Handbook](https://github.com/huggingface/alignment-handbook).
9
 
 
 
 
 
 
 
 
 
 
 
 
10
  | Layers | Context | [Template](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1/blob/19186e70e5679c47aaef473ae2fd56e20765088d/tokenizer_config.json#L59) |
11
  | --- | --- | --- |
12
  | <pre>28</pre> | <pre>8192</pre> | <pre>\<\|im_start\|\>user<br>{prompt}\<\|im_end\|\><br>\<\|im_start\|\>assistant<br>{response}</pre> |
 
7
 
8
  > Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr 7B Gemma is the third model in the series, and is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b) that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). You can reproduce the training of this model via the recipe provided in the [Alignment Handbook](https://github.com/huggingface/alignment-handbook).
9
 
10
+ There are few things to consider when using this model:
11
+ * Special tokens `<|im_start|>` and `<|im_end|>` are not properly mapped as overrides of `<start_of_turn>` and `<end_of_turn>` (issue in the GGUF)
12
+ * Repeat penalty must `1.0` (i.e. disabled) just like with the base model
13
+ * The model was not trained with the system instructions (i.e. don't add the `system` instructions part of the chatml template)
14
+ * Must stop on special token `<end_of_turn>` instead of `<eos>` otherwise the model goes on forever
15
+
16
+ Here's a setup that seems to work quite well to chat with the model. The Q4_K is very fast and gives ~90 t/s on a 3090 full offloaded:
17
+ ```
18
+ ./main -ins -r "<end_of_turn>" --color -s 0 -t 1 -tb 1 -e --in-prefix "<start_of_turn>user\n" --in-suffix "<end_of_turn>\n<start_of_turn>assistant\n" -c 0 --temp 0.7 --repeat-penalty 1.0 -ngl 29 -m ggml-zephyr-7b-gemma-v0.1-q4_k.gguf
19
+ ```
20
+
21
  | Layers | Context | [Template](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1/blob/19186e70e5679c47aaef473ae2fd56e20765088d/tokenizer_config.json#L59) |
22
  | --- | --- | --- |
23
  | <pre>28</pre> | <pre>8192</pre> | <pre>\<\|im_start\|\>user<br>{prompt}\<\|im_end\|\><br>\<\|im_start\|\>assistant<br>{response}</pre> |