pairwise-reward-sft-zephyr-7b-sft-qlora-ultrafeedback-ultrafeedback-binarized-20241013-124646

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.6209	0.0526	100	0.6427	0.6784
0.6346	0.1052	200	0.5829	0.7165
0.5945	0.1578	300	0.5333	0.7351
0.5258	0.2104	400	0.5169	0.7461
0.4914	0.2630	500	0.5209	0.7346
0.4995	0.3155	600	0.5056	0.7536
0.5272	0.3681	700	0.5041	0.7541
0.4993	0.4207	800	0.4943	0.7471
0.5317	0.4733	900	0.4970	0.7602
0.5193	0.5259	1000	0.4850	0.7597
0.4534	0.5785	1100	0.4931	0.7582
0.4828	0.6311	1200	0.4808	0.7582
0.5432	0.6837	1300	0.4836	0.7491
0.4343	0.7363	1400	0.4797	0.7582
0.4287	0.7889	1500	0.4794	0.7612
0.5117	0.8414	1600	0.4799	0.7587
0.4369	0.8940	1700	0.4770	0.7582
0.4537	0.9466	1800	0.4750	0.7566
0.451	0.9992	1900	0.4739	0.7592