dranger003
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ There are few things to consider when using this model:
|
|
15 |
|
16 |
Here's a setup that seems to work quite well to chat with the model. The Q4_K is very fast and gives ~90 t/s on a 3090 full offloaded:
|
17 |
```
|
18 |
-
./main -ins -r "<end_of_turn>" --color -
|
19 |
```
|
20 |
|
21 |
| Layers | Context | [Template](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1/blob/19186e70e5679c47aaef473ae2fd56e20765088d/tokenizer_config.json#L59) |
|
|
|
15 |
|
16 |
Here's a setup that seems to work quite well to chat with the model. The Q4_K is very fast and gives ~90 t/s on a 3090 full offloaded:
|
17 |
```
|
18 |
+
./main -ins -r "<end_of_turn>" --color -e --in-prefix "<start_of_turn>user\n" --in-suffix "<end_of_turn>\n<start_of_turn>assistant\n" -c 0 --temp 0.7 --repeat-penalty 1.0 -ngl 29 -m ggml-zephyr-7b-gemma-v0.1-q4_k.gguf
|
19 |
```
|
20 |
|
21 |
| Layers | Context | [Template](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1/blob/19186e70e5679c47aaef473ae2fd56e20765088d/tokenizer_config.json#L59) |
|