dranger003
/

zephyr-7b-gemma-v0.1-GGUF

Inference Endpoints

Model card Files Files and versions Community

dranger003 commited on Mar 2, 2024

Commit

2ce05e8

·

verified ·

1 Parent(s): 18b44ec

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ There are few things to consider when using this model:
 Here's a setup that seems to work quite well to chat with the model. The Q4_K is very fast and gives ~90 t/s on a 3090 full offloaded:
 ```
-./main -ins -r "<end_of_turn>" --color -s 0 -t 1 -tb 1 -e --in-prefix "<start_of_turn>user\n" --in-suffix "<end_of_turn>\n<start_of_turn>assistant\n" -c 0 --temp 0.7 --repeat-penalty 1.0 -ngl 29 -m ggml-zephyr-7b-gemma-v0.1-q4_k.gguf
 ```
 | Layers | Context | [Template](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1/blob/19186e70e5679c47aaef473ae2fd56e20765088d/tokenizer_config.json#L59) |

 Here's a setup that seems to work quite well to chat with the model. The Q4_K is very fast and gives ~90 t/s on a 3090 full offloaded:
 ```
+./main -ins -r "<end_of_turn>" --color -e --in-prefix "<start_of_turn>user\n" --in-suffix "<end_of_turn>\n<start_of_turn>assistant\n" -c 0 --temp 0.7 --repeat-penalty 1.0 -ngl 29 -m ggml-zephyr-7b-gemma-v0.1-q4_k.gguf
 ```
 | Layers | Context | [Template](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1/blob/19186e70e5679c47aaef473ae2fd56e20765088d/tokenizer_config.json#L59) |