bofenghuang
commited on
Commit
•
7397478
1
Parent(s):
ca0c8d6
Update README.md
Browse files
README.md
CHANGED
@@ -93,8 +93,7 @@ def chat(
|
|
93 |
top_k=top_k,
|
94 |
repetition_penalty=repetition_penalty,
|
95 |
max_new_tokens=max_new_tokens,
|
96 |
-
|
97 |
-
pad_token_id=tokenizer.pad_token_id,
|
98 |
**kwargs,
|
99 |
),
|
100 |
streamer=streamer,
|
@@ -144,6 +143,47 @@ You can also use the Google Colab Notebook provided below.
|
|
144 |
|
145 |
<a href="https://colab.research.google.com/github/bofenghuang/vigogne/blob/main/notebooks/infer_chat.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
|
146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
147 |
## Limitations
|
148 |
|
149 |
Vigogne is still under development, and there are many limitations that have to be addressed. Please note that it is possible that the model generates harmful or biased content, incorrect information or generally unhelpful answers.
|
|
|
93 |
top_k=top_k,
|
94 |
repetition_penalty=repetition_penalty,
|
95 |
max_new_tokens=max_new_tokens,
|
96 |
+
pad_token_id=tokenizer.eos_token_id,
|
|
|
97 |
**kwargs,
|
98 |
),
|
99 |
streamer=streamer,
|
|
|
143 |
|
144 |
<a href="https://colab.research.google.com/github/bofenghuang/vigogne/blob/main/notebooks/infer_chat.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
|
145 |
|
146 |
+
### Inference using the unquantized model with vLLM
|
147 |
+
|
148 |
+
Set up an OpenAI-compatible server with the following command:
|
149 |
+
|
150 |
+
```bash
|
151 |
+
# Install vLLM
|
152 |
+
# This may take 5-10 minutes.
|
153 |
+
# pip install vllm
|
154 |
+
|
155 |
+
# Start server for Vigostral-Chat models
|
156 |
+
python -m vllm.entrypoints.openai.api_server --model bofenghuang/vigostral-7b-chat
|
157 |
+
|
158 |
+
# List models
|
159 |
+
# curl http://localhost:8000/v1/models
|
160 |
+
```
|
161 |
+
|
162 |
+
Query the model using the openai python package.
|
163 |
+
|
164 |
+
```python
|
165 |
+
import openai
|
166 |
+
|
167 |
+
# Modify OpenAI's API key and API base to use vLLM's API server.
|
168 |
+
openai.api_key = "EMPTY"
|
169 |
+
openai.api_base = "http://localhost:8000/v1"
|
170 |
+
|
171 |
+
# First model
|
172 |
+
models = openai.Model.list()
|
173 |
+
model = models["data"][0]["id"]
|
174 |
+
|
175 |
+
# Chat completion API
|
176 |
+
chat_completion = openai.ChatCompletion.create(
|
177 |
+
model=model,
|
178 |
+
messages=[
|
179 |
+
{"role": "user", "content": "Parle-moi de toi-même."},
|
180 |
+
],
|
181 |
+
max_tokens=1024,
|
182 |
+
temperature=0.7,
|
183 |
+
)
|
184 |
+
print("Chat completion results:", chat_completion)
|
185 |
+
```
|
186 |
+
|
187 |
## Limitations
|
188 |
|
189 |
Vigogne is still under development, and there are many limitations that have to be addressed. Please note that it is possible that the model generates harmful or biased content, incorrect information or generally unhelpful answers.
|