zephyr-7b-sft-lora-accum4-lr5e_5-dpo

This model is a fine-tuned version of HuggingFaceH4/zephyr-7b-beta on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
1.5276	0.55	13	1.4329
1.352	1.57	27	1.2406
1.1329	2.55	40	1.0909
1.0628	3.57	54	1.0299
1.0022	4.55	67	0.9812
0.957	5.57	81	0.9445
0.9148	6.55	94	0.8948
0.8443	7.57	108	0.8432
0.7645	8.55	121	0.7847
0.6952	9.57	135	0.7192
0.639	10.55	148	0.6671
0.5683	11.57	162	0.6112
0.5223	12.55	175	0.5777
0.4958	13.57	189	0.5592
0.4592	14.55	202	0.5381
0.4602	15.57	216	0.5100
0.4486	16.55	229	0.5117
0.4274	17.57	243	0.5084
0.4239	18.55	256	0.4909
0.4055	19.57	270	0.5006
0.3931	20.55	283	0.4959
0.3986	21.57	297	0.4853
0.3977	22.55	310	0.4859
0.3936	23.57	324	0.4974
0.3821	24.55	337	0.4952
0.3877	25.57	351	0.4949
0.3681	26.55	364	0.4866
0.3681	27.57	378	0.4926
0.371	28.55	391	0.4817
0.3604	29.57	405	0.4923