|
--- |
|
license: cc-by-nc-4.0 |
|
library_name: transformers |
|
tags: |
|
- trl |
|
- dpo |
|
- conversational |
|
language: |
|
- nl |
|
datasets: |
|
- BramVanroy/ultrachat_200k_dutch |
|
pipeline_tag: text-generation |
|
inference: false |
|
--- |
|
|
|
# Qwen1.5-7B-Dutch-Chat |
|
|
|
## Model description |
|
|
|
This DPO aligned model is the merged version of the adapter model [robinsmits/Qwen1.5-7B-Dutch-Chat-Dpo](robinsmits/Qwen1.5-7B-Dutch-Chat-Dpo). |
|
|
|
DPO Finetuning was performed on the Dutch [BramVanroy/ultra_feedback_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned) dataset. |
|
|
|
See [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) for all information about the base model. |
|
|
|
|
|
## Model usage |
|
|
|
A basic example of how to use the finetuned model. |
|
|
|
``` |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
device = 'cuda' |
|
model_name = 'robinsmits/Qwen1.5-7B-Dutch-Chat' |
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_name, |
|
device_map = "auto", |
|
torch_dtype = torch.bfloat16) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
messages = [{"role": "user", "content": "Hoi hoe gaat het ermee? Wat kun je me vertellen over appels?"}] |
|
|
|
encoded_ids = tokenizer.apply_chat_template(messages, |
|
add_generation_prompt = True, |
|
return_tensors = "pt") |
|
|
|
generated_ids = model.generate(input_ids = encoded_ids.to(device), |
|
max_new_tokens = 256, |
|
do_sample = True) |
|
decoded = tokenizer.batch_decode(generated_ids) |
|
print(decoded[0]) |
|
``` |
|
|
|
Below the chat template with the generated output. |
|
|
|
``` |
|
<|im_start|>system |
|
Je bent een behulpzame AI assistent<|im_end|> |
|
<|im_start|>user |
|
Hoi hoe gaat het ermee? Wat kun je me vertellen over appels?<|im_end|> |
|
<|im_start|>assistant |
|
Hallo! Appels zijn zo'n lekkere fruitsoort. Ze zijn zoet en knapperig, en je kunt ze koken, roosteren of zelfs in smoothies doen. Er zijn heel veel verschillende soorten appels, zoals de Fuji, Granny Smith en Gala. De appels die je meestal in de winkel koopt, komen van bomen die in het oosten van Noord-Amerika groeien.<|im_end|> |
|
``` |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.2610 |
|
- Rewards/chosen: -0.7248 |
|
- Rewards/rejected: -2.6224 |
|
- Rewards/accuracies: 0.9170 |
|
- Rewards/margins: 1.8976 |
|
- Logps/rejected: -877.8102 |
|
- Logps/chosen: -783.4282 |
|
- Logits/rejected: -0.8110 |
|
- Logits/chosen: -0.7528 |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 1e-05 |
|
- train_batch_size: 1 |
|
- eval_batch_size: 2 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 32 |
|
- total_train_batch_size: 32 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_ratio: 0.05 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |
|
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| |
|
| 0.5503 | 0.1 | 30 | 0.4684 | -0.0439 | -0.6295 | 0.8919 | 0.5856 | -837.9513 | -769.8103 | -0.9335 | -0.8894 | |
|
| 0.4178 | 0.2 | 60 | 0.3568 | -0.3713 | -1.4769 | 0.9015 | 1.1056 | -854.9000 | -776.3594 | -0.8768 | -0.8276 | |
|
| 0.3264 | 0.29 | 90 | 0.3143 | -0.4893 | -1.8730 | 0.9151 | 1.3837 | -862.8228 | -778.7191 | -0.8428 | -0.7929 | |
|
| 0.2999 | 0.39 | 120 | 0.2885 | -0.6832 | -2.3118 | 0.9151 | 1.6286 | -871.5981 | -782.5971 | -0.8260 | -0.7730 | |
|
| 0.3454 | 0.49 | 150 | 0.2749 | -0.7239 | -2.4904 | 0.9189 | 1.7664 | -875.1693 | -783.4113 | -0.8235 | -0.7678 | |
|
| 0.3354 | 0.59 | 180 | 0.2685 | -0.6775 | -2.4859 | 0.9170 | 1.8084 | -875.0795 | -782.4824 | -0.8130 | -0.7574 | |
|
| 0.2848 | 0.68 | 210 | 0.2652 | -0.7157 | -2.5692 | 0.9131 | 1.8535 | -876.7465 | -783.2466 | -0.8157 | -0.7586 | |
|
| 0.3437 | 0.78 | 240 | 0.2621 | -0.7233 | -2.6091 | 0.9151 | 1.8857 | -877.5430 | -783.3994 | -0.8138 | -0.7561 | |
|
| 0.2655 | 0.88 | 270 | 0.2611 | -0.7183 | -2.6154 | 0.9151 | 1.8971 | -877.6708 | -783.2995 | -0.8106 | -0.7524 | |
|
| 0.3442 | 0.98 | 300 | 0.2610 | -0.7248 | -2.6224 | 0.9170 | 1.8976 | -877.8102 | -783.4282 | -0.8110 | -0.7528 | |
|
|
|
|
|
### Framework versions |
|
|
|
- PEFT 0.9.0 |
|
- Transformers 4.38.2 |
|
- Pytorch 2.2.1+cu121 |
|
- Datasets 2.17.1 |
|
- Tokenizers 0.15.2 |