TheBloke
/

openchat_v2_openorca_preview-GPTQ

@@ -35,7 +35,6 @@ These models were quantised using hardware kindly provided by [Latitude.sh](http
 ## Prompt template: custom
-```
 The conversation template involves concatenating tokens, and cannot be expressed in plain-text.
 Besides base model vocabulary, an end-of-turn token <|end_of_turn|> is added.
@@ -66,8 +65,6 @@ Hint: In BPE, tokenize(A) + tokenize(B) does not always equals to tokenize(A + B
 Due to the custom tokenisation, GGMLs will not be provided.
-```
 ## Provided files
 Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
@@ -153,13 +150,7 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
 """
 prompt = "Tell me about AI"
-prompt_template=f'''The conversation template involves concatenating tokens, and cannot be expressed in plain-text.
-Besides base model vocabulary, an end-of-turn token <|end_of_turn|> is added.
-Here is an example of single-round conversation template:
-```python
 def tokenize_single_input(tokenizer, prompt):
     # OpenChat V2
     human_prefix = "User:"
@@ -175,39 +166,12 @@ def tokenize_single_input(tokenizer, prompt):
     return [_tokenize_special(bos_token)] + _tokenize(human_prefix) + _tokenize(prompt) + [_tokenize_special(eot_token)] + \
           _tokenize(prefix)
-```
-To explore conditional language models, you can also set prefix = "Assistant GPT3:" to mimic ChatGPT behavior (this may cause performance degradation).
-Hint: In BPE, tokenize(A) + tokenize(B) does not always equals to tokenize(A + B).
-Due to the custom tokenisation, GGMLs will not be provided.
-'''
 print("\n\n*** Generate:")
-input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
 output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
 print(tokenizer.decode(output[0]))
-# Inference can also be done using transformers' pipeline
-# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
-logging.set_verbosity(logging.CRITICAL)
-print("*** Pipeline:")
-pipe = pipeline(
-    "text-generation",
-    model=model,
-    tokenizer=tokenizer,
-    max_new_tokens=512,
-    temperature=0.7,
-    top_p=0.95,
-    repetition_penalty=1.15
-)
-print(pipe(prompt_template)[0]['generated_text'])
 ```
 ## Compatibility

 ## Prompt template: custom
 The conversation template involves concatenating tokens, and cannot be expressed in plain-text.
 Besides base model vocabulary, an end-of-turn token <|end_of_turn|> is added.
 Due to the custom tokenisation, GGMLs will not be provided.
 ## Provided files
 Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
 """
 prompt = "Tell me about AI"
 def tokenize_single_input(tokenizer, prompt):
     # OpenChat V2
     human_prefix = "User:"
     return [_tokenize_special(bos_token)] + _tokenize(human_prefix) + _tokenize(prompt) + [_tokenize_special(eot_token)] + \
           _tokenize(prefix)
 print("\n\n*** Generate:")
+input_ids = tokenizer_single_input(tokenizer, prompt)
 output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
 print(tokenizer.decode(output[0]))
 ```
 ## Compatibility