metadata

license: apache-2.0
base_model: mistralai/Mistral-7B-v0.1
tags:
  - alignment-handbook
  - trl
  - orpo
  - generated_from_trainer
  - trl
  - orpo
  - alignment-handbook
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: zephyr-7b-sft-full-orpo
    results: []

zephyr-7b-sft-full-orpo

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 1.3771
Rewards/chosen: -0.1391
Rewards/rejected: -0.1930
Rewards/accuracies: 0.6528
Rewards/margins: 0.0539
Logps/rejected: -3.8602
Logps/chosen: -2.7813
Logits/rejected: -2.8670
Logits/chosen: -2.8498
Nll Loss: 1.3532
Log Odds Ratio: -1.0480
Log Odds Chosen: 1.2201

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: inverse_sqrt
lr_scheduler_warmup_steps: 100
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Nll Loss	Log Odds Ratio	Log Odds Chosen
0.5668	0.1049	100	0.5843	-0.0456	-0.0529	0.6151	0.0073	-1.0580	-0.9113	-3.3148	-3.3082	0.5516	-0.6530	0.2184
0.5676	0.2098	200	0.5726	-0.0441	-0.0532	0.625	0.0092	-1.0644	-0.8811	-3.0026	-2.9992	0.5359	-0.6474	0.2850
0.5819	0.3146	300	0.5552	-0.0439	-0.0531	0.6290	0.0092	-1.0620	-0.8770	-3.1424	-3.1391	0.5202	-0.6464	0.2830
0.5738	0.4195	400	0.5411	-0.0422	-0.0517	0.6290	0.0096	-1.0346	-0.8434	-3.1026	-3.1020	0.5047	-0.6522	0.2961
0.5478	0.5244	500	0.5319	-0.0421	-0.0525	0.6290	0.0105	-1.0509	-0.8415	-3.0260	-3.0286	0.4970	-0.6382	0.3327
0.5146	0.6293	600	0.5240	-0.0408	-0.0508	0.6230	0.0100	-1.0165	-0.8165	-3.1325	-3.1275	0.4883	-0.6418	0.3121
0.5298	0.7341	700	0.5188	-0.0413	-0.0541	0.6429	0.0128	-1.0827	-0.8267	-3.0761	-3.0755	0.4842	-0.6219	0.3869
0.5181	0.8390	800	0.5141	-0.0410	-0.0524	0.6329	0.0114	-1.0475	-0.8198	-3.1382	-3.1394	0.4803	-0.6322	0.3506
0.5239	0.9439	900	0.5086	-0.0402	-0.0506	0.6310	0.0104	-1.0129	-0.8045	-3.1191	-3.1171	0.4748	-0.6328	0.3268
0.2888	1.0488	1000	0.5400	-0.0436	-0.0556	0.6429	0.0120	-1.1128	-0.8724	-3.0171	-3.0190	0.5058	-0.6318	0.3794
0.29	1.1536	1100	0.5385	-0.0437	-0.0574	0.6468	0.0138	-1.1487	-0.8736	-3.0027	-3.0029	0.5042	-0.6256	0.4247
0.2826	1.2585	1200	0.5428	-0.0443	-0.0581	0.6429	0.0139	-1.1626	-0.8854	-2.9620	-2.9583	0.5084	-0.6254	0.4215
0.2796	1.3634	1300	0.5393	-0.0441	-0.0589	0.6468	0.0147	-1.1771	-0.8825	-2.9256	-2.9285	0.5060	-0.6208	0.4508
0.2784	1.4683	1400	0.5365	-0.0444	-0.0589	0.6528	0.0145	-1.1784	-0.8885	-2.9583	-2.9594	0.5037	-0.6236	0.4410
0.2873	1.5732	1500	0.5330	-0.0436	-0.0579	0.6448	0.0143	-1.1584	-0.8718	-2.9664	-2.9657	0.5004	-0.6226	0.4364
0.276	1.6780	1600	0.5367	-0.0442	-0.0594	0.6409	0.0152	-1.1879	-0.8833	-2.9358	-2.9324	0.5041	-0.6160	0.4570
0.2715	1.7829	1700	0.5349	-0.0436	-0.0580	0.6448	0.0145	-1.1603	-0.8710	-3.0209	-3.0194	0.5024	-0.6272	0.4425
0.2717	1.8878	1800	0.5341	-0.0450	-0.0616	0.6548	0.0166	-1.2325	-0.8997	-2.9579	-2.9563	0.5023	-0.6184	0.4824
0.2857	1.9927	1900	0.5408	-0.0454	-0.0620	0.6548	0.0166	-1.2409	-0.9088	-3.0279	-3.0350	0.5091	-0.6193	0.4892
0.1137	2.0975	2000	0.6877	-0.0620	-0.0838	0.6706	0.0218	-1.6761	-1.2408	-2.8815	-2.8704	0.6539	-0.6273	0.5767
0.1192	2.2024	2100	0.7577	-0.0706	-0.0981	0.6726	0.0275	-1.9620	-1.4122	-2.8433	-2.8372	0.7199	-0.6210	0.6958
0.1178	2.3073	2200	1.1762	-0.1205	-0.1717	0.6528	0.0512	-3.4342	-2.4108	-2.9107	-2.8878	1.1197	-0.7778	1.1628
0.1184	2.4122	2300	1.8520	-0.1935	-0.2541	0.6369	0.0606	-5.0812	-3.8696	-2.9226	-2.9102	1.7542	-1.0562	1.3233
0.1172	2.5170	2400	1.0193	-0.1001	-0.1434	0.6409	0.0432	-2.8671	-2.0024	-2.8710	-2.8561	0.9736	-0.8145	1.0075
0.1109	2.6219	2500	1.2050	-0.1209	-0.1677	0.6329	0.0468	-3.3547	-2.4183	-2.8571	-2.8457	1.1724	-0.9768	1.0766
0.1238	2.7268	2600	2.6922	-0.3036	-0.3822	0.5873	0.0786	-7.6444	-6.0725	-2.9967	-2.9805	2.6498	-1.6934	1.6674
0.1192	2.8317	2700	1.2391	-0.1189	-0.1634	0.625	0.0445	-3.2671	-2.3779	-2.8836	-2.8662	1.1910	-0.9507	1.0201
0.1191	2.9365	2800	1.0214	-0.0976	-0.1394	0.6270	0.0418	-2.7882	-1.9523	-2.8221	-2.8059	0.9673	-0.8558	0.9869

Framework versions

Transformers 4.41.0.dev0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1