File size: 3,330 Bytes

---
library_name: transformers
tags: []
---

# Model Card for Meta-Llama-3-8B-for-bank

This model, **Meta-Llama-3-8B-for-bank**, is a fine-tuned version of the `meta-llama/Meta-Llama-3-8B-Instruct` model (just the adapters from lora). 
## Model Details

### Model Description

- **Model Name**: Meta-Llama-3-8B-for-bank
- **Base Model**: `meta-llama/Meta-Llama-3-8B-Instruct`
- **Fine-tuning Data**: Custom bank chat examples
- **Dataset**: jeromecondere/bank-chat
- **Version**: 1.0
- **License**: Free
- **Language**: English

### Model Type

- **Architecture**: LLaMA-3
- **Type**: Instruction-based language model

### Model Usage

This model is designed for financial service tasks such as:

- **Balance Inquiry**:
  - *Example*: "Can you provide the current balance for my account?"
- **Stock List Retrieval**:
  - *Example*: "Can you provide me with a list of my stocks?"
- **Stock Purchase**:
  - *Example*: "I'd like to buy stocks worth $1,000.00 in Tesla."
- **Deposit Transactions**:
  - *Example*: "I'd like to deposit $500.00 into my account."
- **Withdrawal Transactions**:
  - *Example*: "I'd like to withdraw $200.00 from my account."
- **Transaction History**:
  - *Example*: "I would like to view my transactions. Can you provide it?"

### Inputs and Outputs

- **Inputs**: Natural language queries related to financial services.
- **Outputs**: Textual responses or actions based on the input query.

### Fine-tuning

This model has been fine-tuned with a dataset specifically created to implement a  bank chatbot.


## Limitations

- **Misinterpretation Risks**: Right now this is the first version, so when the query is too complex, inconsistent results will be returned.



## How to Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = 'meta-llama/Meta-Llama-3-8B'
new_model = "jeromecondere/Meta-Llama-3-8B-for-bank"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(new_model, use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Quantization configuration for Lora
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# Load base moodel
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map={"": 0},
    token=token
)

model = PeftModel.from_pretrained(model, new_model)
model = model.merge_and_unload()


# Example of usage
name = 'Walter Sensei'
company = 'Amazon Inc.'
stock_value = 42.24
messages = [
    {'role': 'system', 'content': f'Hi {name}, I\'m your assistant how can I help you'},
    {"role": "user", "content": f"yo, can you just give me the balance of my account?"}
]

# Prepare the message using the chat template
res1 = tokenizer.apply_chat_template(messages, tokenize=False)
print(res1+'\n\n')

# Prepare the messages for the model
input_ids = tokenizer.apply_chat_template(messages, truncation=True, add_generation_prompt=True, return_tensors="pt").to("cuda")

# Inference
outputs = model.generate(
        input_ids=input_ids,
        max_new_tokens=100,
        do_sample=True,
        temperature=0.1,
        top_k=50,
        top_p=0.95
)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])