Update README.md
Browse files
README.md
CHANGED
@@ -23,6 +23,7 @@ The accuracy of the model is surprisingly high, and has a decently fast inferenc
|
|
23 |
We have tested (and thus recommend) running this model on vLLM. We recommend running it from the vLLM openAI server, using the following command:
|
24 |
|
25 |
```bash
|
|
|
26 |
python -m vllm.entrypoints.openai.api_server --model lightblue/Karasu-Mixtral-8x22B-v0.1 --tensor-parallel-size 4 --gpu-memory-utilization 0.95 --max-model-len 1024
|
27 |
```
|
28 |
which is how we ran it on a 4 x A100 (80GB) machine.
|
|
|
23 |
We have tested (and thus recommend) running this model on vLLM. We recommend running it from the vLLM openAI server, using the following command:
|
24 |
|
25 |
```bash
|
26 |
+
pip install vllm
|
27 |
python -m vllm.entrypoints.openai.api_server --model lightblue/Karasu-Mixtral-8x22B-v0.1 --tensor-parallel-size 4 --gpu-memory-utilization 0.95 --max-model-len 1024
|
28 |
```
|
29 |
which is how we ran it on a 4 x A100 (80GB) machine.
|