metadata

license: apache-2.0
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: TheBloke/OpenHermes-2-Mistral-7B-GPTQ
model-index:
  - name: mistral-dpo
    results: []

mistral-dpo

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.6944
Rewards/chosen: 0.2782
Rewards/rejected: 0.0543
Rewards/accuracies: 0.5385
Rewards/margins: 0.2239
Logps/rejected: -187.8588
Logps/chosen: -166.3796
Logits/rejected: -2.4215
Logits/chosen: -2.4790

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2
training_steps: 250
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.7027	0.0	10	0.6989	0.0816	0.0881	0.5577	-0.0065	-187.5204	-168.3459	-2.4271	-2.4774
0.6833	0.0	20	0.7017	-0.0375	-0.0327	0.5288	-0.0048	-188.7280	-169.5362	-2.4376	-2.4828
0.867	0.0	30	0.7193	-0.3147	-0.3086	0.5385	-0.0061	-191.4871	-172.3083	-2.4532	-2.4942
0.8962	0.0	40	0.7068	-0.2076	-0.2208	0.5577	0.0132	-190.6093	-171.2371	-2.4597	-2.5054
0.7467	0.0	50	0.7008	0.1918	0.1648	0.5577	0.0270	-186.7531	-167.2434	-2.4630	-2.5116
0.7335	0.0	60	0.6972	0.3949	0.3373	0.5385	0.0576	-185.0280	-165.2124	-2.4666	-2.5130
0.587	0.01	70	0.7116	0.6763	0.6193	0.4904	0.0570	-182.2083	-162.3980	-2.4675	-2.5126
0.675	0.01	80	0.7330	0.8676	0.8385	0.5096	0.0291	-180.0161	-160.4852	-2.4726	-2.5171
0.6117	0.01	90	0.7454	0.9576	0.9300	0.5192	0.0276	-179.1016	-159.5854	-2.4757	-2.5229
0.5697	0.01	100	0.7715	0.9933	0.9991	0.5	-0.0059	-178.4101	-159.2286	-2.4736	-2.5233
1.1319	0.01	110	0.7652	0.9034	0.8862	0.4904	0.0172	-179.5398	-160.1275	-2.4696	-2.5215
0.5912	0.01	120	0.7476	0.7562	0.7007	0.5096	0.0555	-181.3943	-161.5994	-2.4661	-2.5186
0.702	0.01	130	0.7400	0.7400	0.6590	0.5192	0.0810	-181.8113	-161.7616	-2.4642	-2.5211
0.5566	0.01	140	0.7332	0.6338	0.5293	0.5288	0.1044	-183.1082	-162.8238	-2.4650	-2.5222
0.7823	0.01	150	0.7327	0.5429	0.4408	0.5385	0.1022	-183.9939	-163.7323	-2.4645	-2.5191
0.7549	0.01	160	0.7282	0.3954	0.2907	0.5481	0.1047	-185.4949	-165.2079	-2.4612	-2.5138
0.6506	0.01	170	0.7262	0.3748	0.2716	0.5192	0.1031	-185.6850	-165.4137	-2.4579	-2.5102
0.559	0.01	180	0.7320	0.4578	0.3604	0.5096	0.0974	-184.7973	-164.5831	-2.4589	-2.5109
0.9496	0.02	190	0.7150	0.4227	0.2889	0.5192	0.1339	-185.5128	-164.9340	-2.4480	-2.5007
0.7996	0.02	200	0.7034	0.4051	0.2378	0.5288	0.1673	-186.0234	-165.1101	-2.4391	-2.4926
0.5733	0.02	210	0.6977	0.3946	0.2110	0.5288	0.1836	-186.2916	-165.2155	-2.4327	-2.4875
0.5796	0.02	220	0.6981	0.3933	0.1983	0.5288	0.1949	-186.4181	-165.2286	-2.4260	-2.4824
0.6435	0.02	230	0.6976	0.3726	0.1714	0.5288	0.2012	-186.6871	-165.4354	-2.4237	-2.4807
0.5993	0.02	240	0.6958	0.3088	0.0929	0.5385	0.2159	-187.4724	-166.0730	-2.4222	-2.4799
0.9077	0.02	250	0.6944	0.2782	0.0543	0.5385	0.2239	-187.8588	-166.3796	-2.4215	-2.4790

Framework versions

PEFT 0.8.2
Transformers 4.37.0
Pytorch 2.0.1+cu117
Datasets 2.15.0
Tokenizers 0.15.1