File size: 5,476 Bytes
ca04560 217293f ca04560 217293f bc6d654 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 ca04560 e0ffa30 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
---
license: cc-by-nc-4.0
library_name: transformers
tags:
- trl
- dpo
- conversational
language:
- nl
datasets:
- BramVanroy/ultrachat_200k_dutch
pipeline_tag: text-generation
inference: false
---
# Qwen1.5-7B-Dutch-Chat
## Model description
This DPO aligned model is the merged version of the adapter model [robinsmits/Qwen1.5-7B-Dutch-Chat-Dpo](robinsmits/Qwen1.5-7B-Dutch-Chat-Dpo).
DPO Finetuning was performed on the Dutch [BramVanroy/ultra_feedback_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned) dataset.
See [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) for all information about the base model.
## Model usage
A basic example of how to use the finetuned model.
```
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
device = 'cuda'
model_name = 'robinsmits/Qwen1.5-7B-Dutch-Chat'
model = AutoModelForCausalLM.from_pretrained(model_name,
device_map = "auto",
torch_dtype = torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)
messages = [{"role": "user", "content": "Hoi hoe gaat het ermee? Wat kun je me vertellen over appels?"}]
encoded_ids = tokenizer.apply_chat_template(messages,
add_generation_prompt = True,
return_tensors = "pt")
generated_ids = model.generate(input_ids = encoded_ids.to(device),
max_new_tokens = 256,
do_sample = True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
```
Below the chat template with the generated output.
```
<|im_start|>system
Je bent een behulpzame AI assistent<|im_end|>
<|im_start|>user
Hoi hoe gaat het ermee? Wat kun je me vertellen over appels?<|im_end|>
<|im_start|>assistant
Hallo! Appels zijn zo'n lekkere fruitsoort. Ze zijn zoet en knapperig, en je kunt ze koken, roosteren of zelfs in smoothies doen. Er zijn heel veel verschillende soorten appels, zoals de Fuji, Granny Smith en Gala. De appels die je meestal in de winkel koopt, komen van bomen die in het oosten van Noord-Amerika groeien.<|im_end|>
```
## Intended uses & limitations
More information needed
## Training and evaluation data
It achieves the following results on the evaluation set:
- Loss: 0.2610
- Rewards/chosen: -0.7248
- Rewards/rejected: -2.6224
- Rewards/accuracies: 0.9170
- Rewards/margins: 1.8976
- Logps/rejected: -877.8102
- Logps/chosen: -783.4282
- Logits/rejected: -0.8110
- Logits/chosen: -0.7528
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 32
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.5503 | 0.1 | 30 | 0.4684 | -0.0439 | -0.6295 | 0.8919 | 0.5856 | -837.9513 | -769.8103 | -0.9335 | -0.8894 |
| 0.4178 | 0.2 | 60 | 0.3568 | -0.3713 | -1.4769 | 0.9015 | 1.1056 | -854.9000 | -776.3594 | -0.8768 | -0.8276 |
| 0.3264 | 0.29 | 90 | 0.3143 | -0.4893 | -1.8730 | 0.9151 | 1.3837 | -862.8228 | -778.7191 | -0.8428 | -0.7929 |
| 0.2999 | 0.39 | 120 | 0.2885 | -0.6832 | -2.3118 | 0.9151 | 1.6286 | -871.5981 | -782.5971 | -0.8260 | -0.7730 |
| 0.3454 | 0.49 | 150 | 0.2749 | -0.7239 | -2.4904 | 0.9189 | 1.7664 | -875.1693 | -783.4113 | -0.8235 | -0.7678 |
| 0.3354 | 0.59 | 180 | 0.2685 | -0.6775 | -2.4859 | 0.9170 | 1.8084 | -875.0795 | -782.4824 | -0.8130 | -0.7574 |
| 0.2848 | 0.68 | 210 | 0.2652 | -0.7157 | -2.5692 | 0.9131 | 1.8535 | -876.7465 | -783.2466 | -0.8157 | -0.7586 |
| 0.3437 | 0.78 | 240 | 0.2621 | -0.7233 | -2.6091 | 0.9151 | 1.8857 | -877.5430 | -783.3994 | -0.8138 | -0.7561 |
| 0.2655 | 0.88 | 270 | 0.2611 | -0.7183 | -2.6154 | 0.9151 | 1.8971 | -877.6708 | -783.2995 | -0.8106 | -0.7524 |
| 0.3442 | 0.98 | 300 | 0.2610 | -0.7248 | -2.6224 | 0.9170 | 1.8976 | -877.8102 | -783.4282 | -0.8110 | -0.7528 |
### Framework versions
- PEFT 0.9.0
- Transformers 4.38.2
- Pytorch 2.2.1+cu121
- Datasets 2.17.1
- Tokenizers 0.15.2 |