Update README.md

4a188d2 verified 8 months ago

5.66 kB

	---
	license: cc-by-nc-4.0
	library_name: transformers
	tags:
	- trl
	- dpo
	- conversational
	language:
	- nl
	datasets:
	- BramVanroy/ultra_feedback_dutch_cleaned
	pipeline_tag: text-generation
	inference: false
	---

	# Qwen1.5-7B-Dutch-Chat

	## Model description

	This DPO aligned model is the merged version of the adapter model [robinsmits/Qwen1.5-7B-Dutch-Chat-Dpo](robinsmits/Qwen1.5-7B-Dutch-Chat-Dpo).

	DPO Finetuning was performed on the Dutch [BramVanroy/ultra_feedback_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned) dataset.

	See [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) for all information about the base model.


	## Model usage

	A basic example of how to use the finetuned model.

	```
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	device = 'cuda'
	model_name = 'robinsmits/Qwen1.5-7B-Dutch-Chat'

	model = AutoModelForCausalLM.from_pretrained(model_name,
	device_map = "auto",
	torch_dtype = torch.bfloat16)

	tokenizer = AutoTokenizer.from_pretrained(model_name)

	messages = [{"role": "user", "content": "Hoi hoe gaat het ermee? Wat kun je me vertellen over appels?"}]

	encoded_ids = tokenizer.apply_chat_template(messages,
	add_generation_prompt = True,
	return_tensors = "pt")

	generated_ids = model.generate(input_ids = encoded_ids.to(device),
	max_new_tokens = 256,
	do_sample = True)
	decoded = tokenizer.batch_decode(generated_ids)
	print(decoded[0])
	```

	Below the chat template with the generated output.

	```
	<\|im_start\|>system
	Je bent een behulpzame AI assistent<\|im_end\|>
	<\|im_start\|>user
	Hoi hoe gaat het ermee? Wat kun je me vertellen over appels?<\|im_end\|>
	<\|im_start\|>assistant
	Hallo! Appels zijn zo'n lekkere fruitsoort. Ze zijn zoet en knapperig, en je kunt ze koken, roosteren of zelfs in smoothies doen. Er zijn heel veel verschillende soorten appels, zoals de Fuji, Granny Smith en Gala. De appels die je meestal in de winkel koopt, komen van bomen die in het oosten van Noord-Amerika groeien.<\|im_end\|>
	```

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	The training notebook is available at the following link: [Qwen1_5_7B_Dutch_Chat_DPO](https://github.com/RobinSmits/Dutch-LLMs/blob/main/Qwen1_5_7B_Dutch_Chat_DPO.ipynb)

	It achieves the following results on the evaluation set:
	- Loss: 0.2610
	- Rewards/chosen: -0.7248
	- Rewards/rejected: -2.6224
	- Rewards/accuracies: 0.9170
	- Rewards/margins: 1.8976
	- Logps/rejected: -877.8102
	- Logps/chosen: -783.4282
	- Logits/rejected: -0.8110
	- Logits/chosen: -0.7528

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 1
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 32
	- total_train_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.05
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.5503 \| 0.1 \| 30 \| 0.4684 \| -0.0439 \| -0.6295 \| 0.8919 \| 0.5856 \| -837.9513 \| -769.8103 \| -0.9335 \| -0.8894 \|
	\| 0.4178 \| 0.2 \| 60 \| 0.3568 \| -0.3713 \| -1.4769 \| 0.9015 \| 1.1056 \| -854.9000 \| -776.3594 \| -0.8768 \| -0.8276 \|
	\| 0.3264 \| 0.29 \| 90 \| 0.3143 \| -0.4893 \| -1.8730 \| 0.9151 \| 1.3837 \| -862.8228 \| -778.7191 \| -0.8428 \| -0.7929 \|
	\| 0.2999 \| 0.39 \| 120 \| 0.2885 \| -0.6832 \| -2.3118 \| 0.9151 \| 1.6286 \| -871.5981 \| -782.5971 \| -0.8260 \| -0.7730 \|
	\| 0.3454 \| 0.49 \| 150 \| 0.2749 \| -0.7239 \| -2.4904 \| 0.9189 \| 1.7664 \| -875.1693 \| -783.4113 \| -0.8235 \| -0.7678 \|
	\| 0.3354 \| 0.59 \| 180 \| 0.2685 \| -0.6775 \| -2.4859 \| 0.9170 \| 1.8084 \| -875.0795 \| -782.4824 \| -0.8130 \| -0.7574 \|
	\| 0.2848 \| 0.68 \| 210 \| 0.2652 \| -0.7157 \| -2.5692 \| 0.9131 \| 1.8535 \| -876.7465 \| -783.2466 \| -0.8157 \| -0.7586 \|
	\| 0.3437 \| 0.78 \| 240 \| 0.2621 \| -0.7233 \| -2.6091 \| 0.9151 \| 1.8857 \| -877.5430 \| -783.3994 \| -0.8138 \| -0.7561 \|
	\| 0.2655 \| 0.88 \| 270 \| 0.2611 \| -0.7183 \| -2.6154 \| 0.9151 \| 1.8971 \| -877.6708 \| -783.2995 \| -0.8106 \| -0.7524 \|
	\| 0.3442 \| 0.98 \| 300 \| 0.2610 \| -0.7248 \| -2.6224 \| 0.9170 \| 1.8976 \| -877.8102 \| -783.4282 \| -0.8110 \| -0.7528 \|


	### Framework versions

	- PEFT 0.9.0
	- Transformers 4.38.2
	- Pytorch 2.2.1+cu121
	- Datasets 2.17.1
	- Tokenizers 0.15.2