dranger003 commited on
Commit
2ce05e8
·
verified ·
1 Parent(s): 18b44ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -15,7 +15,7 @@ There are few things to consider when using this model:
15
 
16
  Here's a setup that seems to work quite well to chat with the model. The Q4_K is very fast and gives ~90 t/s on a 3090 full offloaded:
17
  ```
18
- ./main -ins -r "<end_of_turn>" --color -s 0 -t 1 -tb 1 -e --in-prefix "<start_of_turn>user\n" --in-suffix "<end_of_turn>\n<start_of_turn>assistant\n" -c 0 --temp 0.7 --repeat-penalty 1.0 -ngl 29 -m ggml-zephyr-7b-gemma-v0.1-q4_k.gguf
19
  ```
20
 
21
  | Layers | Context | [Template](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1/blob/19186e70e5679c47aaef473ae2fd56e20765088d/tokenizer_config.json#L59) |
 
15
 
16
  Here's a setup that seems to work quite well to chat with the model. The Q4_K is very fast and gives ~90 t/s on a 3090 full offloaded:
17
  ```
18
+ ./main -ins -r "<end_of_turn>" --color -e --in-prefix "<start_of_turn>user\n" --in-suffix "<end_of_turn>\n<start_of_turn>assistant\n" -c 0 --temp 0.7 --repeat-penalty 1.0 -ngl 29 -m ggml-zephyr-7b-gemma-v0.1-q4_k.gguf
19
  ```
20
 
21
  | Layers | Context | [Template](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1/blob/19186e70e5679c47aaef473ae2fd56e20765088d/tokenizer_config.json#L59) |