--- license: cc-by-nc-4.0 library_name: transformers tags: - trl - dpo - conversational language: - nl datasets: - BramVanroy/ultra_feedback_dutch_cleaned pipeline_tag: text-generation inference: false --- # Qwen1.5-7B-Dutch-Chat ## Model description This DPO aligned model is the merged version of the adapter model [robinsmits/Qwen1.5-7B-Dutch-Chat-Dpo](robinsmits/Qwen1.5-7B-Dutch-Chat-Dpo). DPO Finetuning was performed on the Dutch [BramVanroy/ultra_feedback_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned) dataset. See [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) for all information about the base model. ## Model usage A basic example of how to use the finetuned model. ``` import torch from transformers import AutoTokenizer, AutoModelForCausalLM device = 'cuda' model_name = 'robinsmits/Qwen1.5-7B-Dutch-Chat' model = AutoModelForCausalLM.from_pretrained(model_name, device_map = "auto", torch_dtype = torch.bfloat16) tokenizer = AutoTokenizer.from_pretrained(model_name) messages = [{"role": "user", "content": "Hoi hoe gaat het ermee? Wat kun je me vertellen over appels?"}] encoded_ids = tokenizer.apply_chat_template(messages, add_generation_prompt = True, return_tensors = "pt") generated_ids = model.generate(input_ids = encoded_ids.to(device), max_new_tokens = 256, do_sample = True) decoded = tokenizer.batch_decode(generated_ids) print(decoded[0]) ``` Below the chat template with the generated output. ``` <|im_start|>system Je bent een behulpzame AI assistent<|im_end|> <|im_start|>user Hoi hoe gaat het ermee? Wat kun je me vertellen over appels?<|im_end|> <|im_start|>assistant Hallo! Appels zijn zo'n lekkere fruitsoort. Ze zijn zoet en knapperig, en je kunt ze koken, roosteren of zelfs in smoothies doen. Er zijn heel veel verschillende soorten appels, zoals de Fuji, Granny Smith en Gala. De appels die je meestal in de winkel koopt, komen van bomen die in het oosten van Noord-Amerika groeien.<|im_end|> ``` ## Intended uses & limitations More information needed ## Training and evaluation data The training notebook is available at the following link: [Qwen1_5_7B_Dutch_Chat_DPO](https://github.com/RobinSmits/Dutch-LLMs/blob/main/Qwen1_5_7B_Dutch_Chat_DPO.ipynb) It achieves the following results on the evaluation set: - Loss: 0.2610 - Rewards/chosen: -0.7248 - Rewards/rejected: -2.6224 - Rewards/accuracies: 0.9170 - Rewards/margins: 1.8976 - Logps/rejected: -877.8102 - Logps/chosen: -783.4282 - Logits/rejected: -0.8110 - Logits/chosen: -0.7528 ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 1 - eval_batch_size: 2 - seed: 42 - gradient_accumulation_steps: 32 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.5503 | 0.1 | 30 | 0.4684 | -0.0439 | -0.6295 | 0.8919 | 0.5856 | -837.9513 | -769.8103 | -0.9335 | -0.8894 | | 0.4178 | 0.2 | 60 | 0.3568 | -0.3713 | -1.4769 | 0.9015 | 1.1056 | -854.9000 | -776.3594 | -0.8768 | -0.8276 | | 0.3264 | 0.29 | 90 | 0.3143 | -0.4893 | -1.8730 | 0.9151 | 1.3837 | -862.8228 | -778.7191 | -0.8428 | -0.7929 | | 0.2999 | 0.39 | 120 | 0.2885 | -0.6832 | -2.3118 | 0.9151 | 1.6286 | -871.5981 | -782.5971 | -0.8260 | -0.7730 | | 0.3454 | 0.49 | 150 | 0.2749 | -0.7239 | -2.4904 | 0.9189 | 1.7664 | -875.1693 | -783.4113 | -0.8235 | -0.7678 | | 0.3354 | 0.59 | 180 | 0.2685 | -0.6775 | -2.4859 | 0.9170 | 1.8084 | -875.0795 | -782.4824 | -0.8130 | -0.7574 | | 0.2848 | 0.68 | 210 | 0.2652 | -0.7157 | -2.5692 | 0.9131 | 1.8535 | -876.7465 | -783.2466 | -0.8157 | -0.7586 | | 0.3437 | 0.78 | 240 | 0.2621 | -0.7233 | -2.6091 | 0.9151 | 1.8857 | -877.5430 | -783.3994 | -0.8138 | -0.7561 | | 0.2655 | 0.88 | 270 | 0.2611 | -0.7183 | -2.6154 | 0.9151 | 1.8971 | -877.6708 | -783.2995 | -0.8106 | -0.7524 | | 0.3442 | 0.98 | 300 | 0.2610 | -0.7248 | -2.6224 | 0.9170 | 1.8976 | -877.8102 | -783.4282 | -0.8110 | -0.7528 | ### Framework versions - PEFT 0.9.0 - Transformers 4.38.2 - Pytorch 2.2.1+cu121 - Datasets 2.17.1 - Tokenizers 0.15.2