hon9kon9ize
/

CantoneseLLMChat-preview20240326

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

indiejoseph commited on Mar 27

Commit

0b9c99a

•

1 Parent(s): 9ae02ea

Update README.md

Files changed (1) hide show

README.md +41 -0

README.md CHANGED Viewed

@@ -1,3 +1,44 @@
 ---
 license: cc-by-nc-sa-4.0
 ---

 ---
 license: cc-by-nc-sa-4.0
 ---
+### Usage
+```python
+from transformers import AutoModelForCausalLM, BitsAndBytesConfig, LlamaTokenizer
+# bnb_config = BitsAndBytesConfig(
+#     load_in_4bit=True,
+#     bnb_4bit_use_double_quant=True,
+#     bnb_4bit_quant_type="nf4",
+#     bnb_4bit_compute_dtype=torch.bfloat16
+# )
+model = AutoModelForCausalLM.from_pretrained(
+  model_name,
+  torch_dtype=torch.bfloat16,
+  device_map='auto',
+  # quantization_config=bnb_config, # uncomment here and bnb_config to use 4bit quantiziation
+)
+tokenizer = LlamaTokenizer.from_pretrained(model_name)
+def chat(messages, temperature=0.9, max_new_tokens=200):
+    # chat template defination can be found in generation_config.json
+    input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda:0')
+    output_ids = model.generate(input_ids, max_new_tokens=max_new_tokens, temperature=temperature, num_return_sequences=1, do_sample=True, top_k=50, top_p=0.95, num_beams=3, repetition_penalty=1.18)
+    print(output_ids)
+    response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=False)
+    return response
+messages = [{"role": "user", "content": "邊個係香港特首？"}]
+# chat template included default system message, but you can define your own system message
+# messages = [
+#  {"role": "system", "content": "你叫做櫻子，你要同用家北原伊織進行對話，你同北原伊織係情女關係。"},
+#  {"role": "user", "content": "櫻子，令日你會去邊度玩呀？"}
+# ]
+print(chat(messages))
+```