jeromecondere
/

Meta-Llama-3-8B-for-bank

Inference Endpoints

Model card Files Files and versions Community

Meta-Llama-3-8B-for-bank / README.md

jeromecondere's picture

Update README.md

d1ba32f verified 5 months ago

|

3.36 kB

	---
	library_name: transformers
	tags: []
	---

	# Model Card for Meta-Llama-3-8B-for-bank

	This model, Meta-Llama-3-8B-for-bank, is a fine-tuned version of the `meta-llama/Meta-Llama-3-8B-Instruct` model (just the adapters from lora).
	This is a naive version.
	## Model Details

	### Model Description

	- Model Name: Meta-Llama-3-8B-for-bank
	- Base Model: `meta-llama/Meta-Llama-3-8B-Instruct`
	- Fine-tuning Data: Custom bank chat examples
	- Dataset: jeromecondere/bank-chat
	- Version: 1.0
	- License: Free
	- Language: English

	### Model Type

	- Architecture: LLaMA-3
	- Type: Instruction-based language model

	### Model Usage

	This model is designed for financial service tasks such as:

	- Balance Inquiry:
	- Example: "Can you provide the current balance for my account?"
	- Stock List Retrieval:
	- Example: "Can you provide me with a list of my stocks?"
	- Stock Purchase:
	- Example: "I'd like to buy stocks worth 1,000.00 in Tesla."
	- Deposit Transactions:
	- Example: "I'd like to deposit 500.00 into my account."
	- Withdrawal Transactions:
	- Example: "I'd like to withdraw 200.00 from my account."
	- Transaction History:
	- Example: "I would like to view my transactions. Can you provide it?"

	### Inputs and Outputs

	- Inputs: Natural language queries related to financial services.
	- Outputs: Textual responses or actions based on the input query.

	### Fine-tuning

	This model has been fine-tuned with a dataset specifically created to implement a bank chatbot.


	## Limitations

	- Misinterpretation Risks: Right now this is the first version, so when the query is too complex, inconsistent results will be returned.



	## How to Use

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	base_model = 'meta-llama/Meta-Llama-3-8B'
	new_model = "jeromecondere/Meta-Llama-3-8B-for-bank"

	# Load tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained(new_model, use_fast=False)
	tokenizer.pad_token = tokenizer.eos_token
	tokenizer.padding_side = "right"

	# Quantization configuration for Lora
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.float16,
	bnb_4bit_use_double_quant=True,
	)

	# Load base moodel
	model = AutoModelForCausalLM.from_pretrained(
	base_model,
	quantization_config=bnb_config,
	device_map={"": 0},
	token=token
	)

	model = PeftModel.from_pretrained(model, new_model)
	model = model.merge_and_unload()


	# Example of usage
	name = 'Walter Sensei'
	company = 'Amazon Inc.'
	stock_value = 42.24
	messages = [
	{'role': 'system', 'content': f'Hi {name}, I\'m your assistant how can I help you'},
	{"role": "user", "content": f"yo, can you just give me the balance of my account?"}
	]

	# Prepare the message using the chat template
	res1 = tokenizer.apply_chat_template(messages, tokenize=False)
	print(res1+'\n\n')

	# Prepare the messages for the model
	input_ids = tokenizer.apply_chat_template(messages, truncation=True, add_generation_prompt=True, return_tensors="pt").to("cuda")

	# Inference
	outputs = model.generate(
	input_ids=input_ids,
	max_new_tokens=100,
	do_sample=True,
	temperature=0.1,
	top_k=50,
	top_p=0.95
	)
	print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])