Update README.md
Browse files
README.md
CHANGED
@@ -35,7 +35,6 @@ These models were quantised using hardware kindly provided by [Latitude.sh](http
|
|
35 |
|
36 |
## Prompt template: custom
|
37 |
|
38 |
-
```
|
39 |
The conversation template involves concatenating tokens, and cannot be expressed in plain-text.
|
40 |
|
41 |
Besides base model vocabulary, an end-of-turn token <|end_of_turn|> is added.
|
@@ -66,8 +65,6 @@ Hint: In BPE, tokenize(A) + tokenize(B) does not always equals to tokenize(A + B
|
|
66 |
|
67 |
Due to the custom tokenisation, GGMLs will not be provided.
|
68 |
|
69 |
-
```
|
70 |
-
|
71 |
## Provided files
|
72 |
|
73 |
Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
|
@@ -153,13 +150,7 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
|
153 |
"""
|
154 |
|
155 |
prompt = "Tell me about AI"
|
156 |
-
prompt_template=f'''The conversation template involves concatenating tokens, and cannot be expressed in plain-text.
|
157 |
-
|
158 |
-
Besides base model vocabulary, an end-of-turn token <|end_of_turn|> is added.
|
159 |
|
160 |
-
Here is an example of single-round conversation template:
|
161 |
-
|
162 |
-
```python
|
163 |
def tokenize_single_input(tokenizer, prompt):
|
164 |
# OpenChat V2
|
165 |
human_prefix = "User:"
|
@@ -175,39 +166,12 @@ def tokenize_single_input(tokenizer, prompt):
|
|
175 |
|
176 |
return [_tokenize_special(bos_token)] + _tokenize(human_prefix) + _tokenize(prompt) + [_tokenize_special(eot_token)] + \
|
177 |
_tokenize(prefix)
|
178 |
-
```
|
179 |
-
|
180 |
-
To explore conditional language models, you can also set prefix = "Assistant GPT3:" to mimic ChatGPT behavior (this may cause performance degradation).
|
181 |
-
|
182 |
-
Hint: In BPE, tokenize(A) + tokenize(B) does not always equals to tokenize(A + B).
|
183 |
-
|
184 |
-
Due to the custom tokenisation, GGMLs will not be provided.
|
185 |
-
|
186 |
-
'''
|
187 |
|
188 |
print("\n\n*** Generate:")
|
189 |
|
190 |
-
input_ids = tokenizer
|
191 |
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
|
192 |
print(tokenizer.decode(output[0]))
|
193 |
-
|
194 |
-
# Inference can also be done using transformers' pipeline
|
195 |
-
|
196 |
-
# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
|
197 |
-
logging.set_verbosity(logging.CRITICAL)
|
198 |
-
|
199 |
-
print("*** Pipeline:")
|
200 |
-
pipe = pipeline(
|
201 |
-
"text-generation",
|
202 |
-
model=model,
|
203 |
-
tokenizer=tokenizer,
|
204 |
-
max_new_tokens=512,
|
205 |
-
temperature=0.7,
|
206 |
-
top_p=0.95,
|
207 |
-
repetition_penalty=1.15
|
208 |
-
)
|
209 |
-
|
210 |
-
print(pipe(prompt_template)[0]['generated_text'])
|
211 |
```
|
212 |
|
213 |
## Compatibility
|
|
|
35 |
|
36 |
## Prompt template: custom
|
37 |
|
|
|
38 |
The conversation template involves concatenating tokens, and cannot be expressed in plain-text.
|
39 |
|
40 |
Besides base model vocabulary, an end-of-turn token <|end_of_turn|> is added.
|
|
|
65 |
|
66 |
Due to the custom tokenisation, GGMLs will not be provided.
|
67 |
|
|
|
|
|
68 |
## Provided files
|
69 |
|
70 |
Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
|
|
|
150 |
"""
|
151 |
|
152 |
prompt = "Tell me about AI"
|
|
|
|
|
|
|
153 |
|
|
|
|
|
|
|
154 |
def tokenize_single_input(tokenizer, prompt):
|
155 |
# OpenChat V2
|
156 |
human_prefix = "User:"
|
|
|
166 |
|
167 |
return [_tokenize_special(bos_token)] + _tokenize(human_prefix) + _tokenize(prompt) + [_tokenize_special(eot_token)] + \
|
168 |
_tokenize(prefix)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
169 |
|
170 |
print("\n\n*** Generate:")
|
171 |
|
172 |
+
input_ids = tokenizer_single_input(tokenizer, prompt)
|
173 |
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
|
174 |
print(tokenizer.decode(output[0]))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
175 |
```
|
176 |
|
177 |
## Compatibility
|