jeromecondere
/

Meta-Llama-3-8B-for-bank

Inference Endpoints

Model card Files Files and versions Community

jeromecondere commited on Aug 17, 2024

Commit

e9f520b

·

verified ·

1 Parent(s): 8713894

Update README.md

Files changed (1) hide show

README.md +27 -4

README.md CHANGED Viewed

@@ -47,7 +47,7 @@ This model is designed for financial service tasks such as:
 ### Fine-tuning
-This model has been fine-tuned with a dataset specifically created to implement a chatbot,
 ## Limitations
@@ -61,10 +61,33 @@ This model has been fine-tuned with a dataset specifically created to implement
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 # Load tokenizer and model
-tokenizer = AutoTokenizer.from_pretrained("jeromecondere/Meta-Llama-3-8B-for-bank")
-#merge it first with llama3-8b-instrucions
-model = AutoModelForCausalLM.from_pretrained("jeromecondere/Meta-Llama-3-8B-for-bank").to("cuda")
 # Example of usage
 name = 'Walter Sensei'

 ### Fine-tuning
+This model has been fine-tuned with a dataset specifically created to implement a  bank chatbot.
 ## Limitations
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+base_model = 'meta-llama/Meta-Llama-3-8B'
+new_model = "jeromecondere/Meta-Llama-3-8B-for-bank"
 # Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained(new_model, use_fast=False)
+tokenizer.pad_token = tokenizer.eos_token
+tokenizer.padding_side = "right"
+# Quantization configuration for Lora
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+    bnb_4bit_use_double_quant=True,
+)
+# Load base moodel
+model = AutoModelForCausalLM.from_pretrained(
+    base_model,
+    quantization_config=bnb_config,
+    device_map={"": 0},
+    token=token
+)
+model = PeftModel.from_pretrained(model, new_model)
+model = model.merge_and_unload()
 # Example of usage
 name = 'Walter Sensei'