phi-2-apo

This model is a fine-tuned version of rasyosef/phi-2-sft-openhermes-128k-v2-merged on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.3669	0.2041	250	0.3828	1.5010	-2.9712	0.9450	4.4722	-172.0644	-254.5310	-0.4930	0.0860
0.3514	0.4082	500	0.3786	1.5375	-2.9788	0.9400	4.5163	-172.1404	-254.1665	-0.4834	0.0968
0.3539	0.6122	750	0.3756	1.5549	-3.0097	0.9400	4.5647	-172.4500	-253.9920	-0.4690	0.1096
0.3562	0.8163	1000	0.3736	1.5759	-3.0081	0.9450	4.5840	-172.4332	-253.7824	-0.4558	0.1220
0.3437	1.0204	1250	0.3720	1.5665	-3.0805	0.9350	4.6470	-173.1577	-253.8766	-0.4445	0.1325
0.3503	1.2245	1500	0.3710	1.5889	-3.0515	0.9400	4.6404	-172.8680	-253.6525	-0.4406	0.1347
0.3427	1.4286	1750	0.3697	1.5903	-3.0719	0.9450	4.6622	-173.0719	-253.6384	-0.4355	0.1387
0.3353	1.6327	2000	0.3699	1.5881	-3.0875	0.9400	4.6756	-173.2272	-253.6602	-0.4333	0.1412
0.3441	1.8367	2250	0.3695	1.5931	-3.0842	0.9350	4.6772	-173.1941	-253.6105	-0.4322	0.1424