mistral-dpo / README.md

abhiGOAT

abhiGOAT/DPO

434ba40 verified 8 months ago

preview code

raw

history blame contribute delete

No virus

6.69 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: TheBloke/OpenHermes-2-Mistral-7B-GPTQ
	model-index:
	- name: mistral-dpo
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mistral-dpo

	This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6944
	- Rewards/chosen: 0.2782
	- Rewards/rejected: 0.0543
	- Rewards/accuracies: 0.5385
	- Rewards/margins: 0.2239
	- Logps/rejected: -187.8588
	- Logps/chosen: -166.3796
	- Logits/rejected: -2.4215
	- Logits/chosen: -2.4790

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 1
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 2
	- training_steps: 250
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.7027 \| 0.0 \| 10 \| 0.6989 \| 0.0816 \| 0.0881 \| 0.5577 \| -0.0065 \| -187.5204 \| -168.3459 \| -2.4271 \| -2.4774 \|
	\| 0.6833 \| 0.0 \| 20 \| 0.7017 \| -0.0375 \| -0.0327 \| 0.5288 \| -0.0048 \| -188.7280 \| -169.5362 \| -2.4376 \| -2.4828 \|
	\| 0.867 \| 0.0 \| 30 \| 0.7193 \| -0.3147 \| -0.3086 \| 0.5385 \| -0.0061 \| -191.4871 \| -172.3083 \| -2.4532 \| -2.4942 \|
	\| 0.8962 \| 0.0 \| 40 \| 0.7068 \| -0.2076 \| -0.2208 \| 0.5577 \| 0.0132 \| -190.6093 \| -171.2371 \| -2.4597 \| -2.5054 \|
	\| 0.7467 \| 0.0 \| 50 \| 0.7008 \| 0.1918 \| 0.1648 \| 0.5577 \| 0.0270 \| -186.7531 \| -167.2434 \| -2.4630 \| -2.5116 \|
	\| 0.7335 \| 0.0 \| 60 \| 0.6972 \| 0.3949 \| 0.3373 \| 0.5385 \| 0.0576 \| -185.0280 \| -165.2124 \| -2.4666 \| -2.5130 \|
	\| 0.587 \| 0.01 \| 70 \| 0.7116 \| 0.6763 \| 0.6193 \| 0.4904 \| 0.0570 \| -182.2083 \| -162.3980 \| -2.4675 \| -2.5126 \|
	\| 0.675 \| 0.01 \| 80 \| 0.7330 \| 0.8676 \| 0.8385 \| 0.5096 \| 0.0291 \| -180.0161 \| -160.4852 \| -2.4726 \| -2.5171 \|
	\| 0.6117 \| 0.01 \| 90 \| 0.7454 \| 0.9576 \| 0.9300 \| 0.5192 \| 0.0276 \| -179.1016 \| -159.5854 \| -2.4757 \| -2.5229 \|
	\| 0.5697 \| 0.01 \| 100 \| 0.7715 \| 0.9933 \| 0.9991 \| 0.5 \| -0.0059 \| -178.4101 \| -159.2286 \| -2.4736 \| -2.5233 \|
	\| 1.1319 \| 0.01 \| 110 \| 0.7652 \| 0.9034 \| 0.8862 \| 0.4904 \| 0.0172 \| -179.5398 \| -160.1275 \| -2.4696 \| -2.5215 \|
	\| 0.5912 \| 0.01 \| 120 \| 0.7476 \| 0.7562 \| 0.7007 \| 0.5096 \| 0.0555 \| -181.3943 \| -161.5994 \| -2.4661 \| -2.5186 \|
	\| 0.702 \| 0.01 \| 130 \| 0.7400 \| 0.7400 \| 0.6590 \| 0.5192 \| 0.0810 \| -181.8113 \| -161.7616 \| -2.4642 \| -2.5211 \|
	\| 0.5566 \| 0.01 \| 140 \| 0.7332 \| 0.6338 \| 0.5293 \| 0.5288 \| 0.1044 \| -183.1082 \| -162.8238 \| -2.4650 \| -2.5222 \|
	\| 0.7823 \| 0.01 \| 150 \| 0.7327 \| 0.5429 \| 0.4408 \| 0.5385 \| 0.1022 \| -183.9939 \| -163.7323 \| -2.4645 \| -2.5191 \|
	\| 0.7549 \| 0.01 \| 160 \| 0.7282 \| 0.3954 \| 0.2907 \| 0.5481 \| 0.1047 \| -185.4949 \| -165.2079 \| -2.4612 \| -2.5138 \|
	\| 0.6506 \| 0.01 \| 170 \| 0.7262 \| 0.3748 \| 0.2716 \| 0.5192 \| 0.1031 \| -185.6850 \| -165.4137 \| -2.4579 \| -2.5102 \|
	\| 0.559 \| 0.01 \| 180 \| 0.7320 \| 0.4578 \| 0.3604 \| 0.5096 \| 0.0974 \| -184.7973 \| -164.5831 \| -2.4589 \| -2.5109 \|
	\| 0.9496 \| 0.02 \| 190 \| 0.7150 \| 0.4227 \| 0.2889 \| 0.5192 \| 0.1339 \| -185.5128 \| -164.9340 \| -2.4480 \| -2.5007 \|
	\| 0.7996 \| 0.02 \| 200 \| 0.7034 \| 0.4051 \| 0.2378 \| 0.5288 \| 0.1673 \| -186.0234 \| -165.1101 \| -2.4391 \| -2.4926 \|
	\| 0.5733 \| 0.02 \| 210 \| 0.6977 \| 0.3946 \| 0.2110 \| 0.5288 \| 0.1836 \| -186.2916 \| -165.2155 \| -2.4327 \| -2.4875 \|
	\| 0.5796 \| 0.02 \| 220 \| 0.6981 \| 0.3933 \| 0.1983 \| 0.5288 \| 0.1949 \| -186.4181 \| -165.2286 \| -2.4260 \| -2.4824 \|
	\| 0.6435 \| 0.02 \| 230 \| 0.6976 \| 0.3726 \| 0.1714 \| 0.5288 \| 0.2012 \| -186.6871 \| -165.4354 \| -2.4237 \| -2.4807 \|
	\| 0.5993 \| 0.02 \| 240 \| 0.6958 \| 0.3088 \| 0.0929 \| 0.5385 \| 0.2159 \| -187.4724 \| -166.0730 \| -2.4222 \| -2.4799 \|
	\| 0.9077 \| 0.02 \| 250 \| 0.6944 \| 0.2782 \| 0.0543 \| 0.5385 \| 0.2239 \| -187.8588 \| -166.3796 \| -2.4215 \| -2.4790 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.37.0
	- Pytorch 2.0.1+cu117
	- Datasets 2.15.0
	- Tokenizers 0.15.1