reward_modeling

This model is a fine-tuned version of google/gemma-2b on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.9241	0.0787	5	0.6996	0.5678
0.7708	0.1575	10	0.6284	0.6660
0.7875	0.2362	15	0.5749	0.7244
0.6575	0.3150	20	0.5360	0.7390
0.6802	0.3937	25	0.5087	0.7432
0.3982	0.4724	30	0.4890	0.7578
0.4555	0.5512	35	0.4775	0.7599
0.8838	0.6299	40	0.4683	0.7662
0.4692	0.7087	45	0.4611	0.7662
0.5455	0.7874	50	0.4531	0.7620
0.5696	0.8661	55	0.4459	0.7662
0.7453	0.9449	60	0.4414	0.7766
0.5369	1.0236	65	0.4371	0.7829
0.3994	1.1024	70	0.4334	0.7850
0.4235	1.1811	75	0.4298	0.7912
0.4811	1.2598	80	0.4266	0.7912
0.5072	1.3386	85	0.4253	0.7912
0.4405	1.4173	90	0.4228	0.7850
0.5349	1.4961	95	0.4196	0.7871
0.3342	1.5748	100	0.4170	0.7829
0.5271	1.6535	105	0.4149	0.7933
0.3463	1.7323	110	0.4136	0.7975
0.4867	1.8110	115	0.4128	0.7996
0.3221	1.8898	120	0.4125	0.7996
0.3542	1.9685	125	0.4116	0.7996
0.5465	2.0472	130	0.4107	0.7996
0.3427	2.1260	135	0.4101	0.7996
0.4787	2.2047	140	0.4087	0.8038
0.4229	2.2835	145	0.4073	0.8017
0.4514	2.3622	150	0.4063	0.8038
0.5116	2.4409	155	0.4051	0.8038
0.3234	2.5197	160	0.4045	0.8058
0.3993	2.5984	165	0.4040	0.8058
0.3264	2.6772	170	0.4037	0.8058
0.3316	2.7559	175	0.4035	0.8038
0.4855	2.8346	180	0.4035	0.8038
0.536	2.9134	185	0.4036	0.8058