TheBloke commited on
Commit
0371551
1 Parent(s): b73ae73

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -37
README.md CHANGED
@@ -35,7 +35,6 @@ These models were quantised using hardware kindly provided by [Latitude.sh](http
35
 
36
  ## Prompt template: custom
37
 
38
- ```
39
  The conversation template involves concatenating tokens, and cannot be expressed in plain-text.
40
 
41
  Besides base model vocabulary, an end-of-turn token <|end_of_turn|> is added.
@@ -66,8 +65,6 @@ Hint: In BPE, tokenize(A) + tokenize(B) does not always equals to tokenize(A + B
66
 
67
  Due to the custom tokenisation, GGMLs will not be provided.
68
 
69
- ```
70
-
71
  ## Provided files
72
 
73
  Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
@@ -153,13 +150,7 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
153
  """
154
 
155
  prompt = "Tell me about AI"
156
- prompt_template=f'''The conversation template involves concatenating tokens, and cannot be expressed in plain-text.
157
-
158
- Besides base model vocabulary, an end-of-turn token <|end_of_turn|> is added.
159
 
160
- Here is an example of single-round conversation template:
161
-
162
- ```python
163
  def tokenize_single_input(tokenizer, prompt):
164
  # OpenChat V2
165
  human_prefix = "User:"
@@ -175,39 +166,12 @@ def tokenize_single_input(tokenizer, prompt):
175
 
176
  return [_tokenize_special(bos_token)] + _tokenize(human_prefix) + _tokenize(prompt) + [_tokenize_special(eot_token)] + \
177
  _tokenize(prefix)
178
- ```
179
-
180
- To explore conditional language models, you can also set prefix = "Assistant GPT3:" to mimic ChatGPT behavior (this may cause performance degradation).
181
-
182
- Hint: In BPE, tokenize(A) + tokenize(B) does not always equals to tokenize(A + B).
183
-
184
- Due to the custom tokenisation, GGMLs will not be provided.
185
-
186
- '''
187
 
188
  print("\n\n*** Generate:")
189
 
190
- input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
191
  output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
192
  print(tokenizer.decode(output[0]))
193
-
194
- # Inference can also be done using transformers' pipeline
195
-
196
- # Prevent printing spurious transformers error when using pipeline with AutoGPTQ
197
- logging.set_verbosity(logging.CRITICAL)
198
-
199
- print("*** Pipeline:")
200
- pipe = pipeline(
201
- "text-generation",
202
- model=model,
203
- tokenizer=tokenizer,
204
- max_new_tokens=512,
205
- temperature=0.7,
206
- top_p=0.95,
207
- repetition_penalty=1.15
208
- )
209
-
210
- print(pipe(prompt_template)[0]['generated_text'])
211
  ```
212
 
213
  ## Compatibility
 
35
 
36
  ## Prompt template: custom
37
 
 
38
  The conversation template involves concatenating tokens, and cannot be expressed in plain-text.
39
 
40
  Besides base model vocabulary, an end-of-turn token <|end_of_turn|> is added.
 
65
 
66
  Due to the custom tokenisation, GGMLs will not be provided.
67
 
 
 
68
  ## Provided files
69
 
70
  Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
 
150
  """
151
 
152
  prompt = "Tell me about AI"
 
 
 
153
 
 
 
 
154
  def tokenize_single_input(tokenizer, prompt):
155
  # OpenChat V2
156
  human_prefix = "User:"
 
166
 
167
  return [_tokenize_special(bos_token)] + _tokenize(human_prefix) + _tokenize(prompt) + [_tokenize_special(eot_token)] + \
168
  _tokenize(prefix)
 
 
 
 
 
 
 
 
 
169
 
170
  print("\n\n*** Generate:")
171
 
172
+ input_ids = tokenizer_single_input(tokenizer, prompt)
173
  output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
174
  print(tokenizer.decode(output[0]))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
175
  ```
176
 
177
  ## Compatibility