Llama-2-7b-spin-10k

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/real	Rewards/generated	Rewards/accuracies	Rewards/margins	Logps/generated	Logps/real	Logits/generated	Logits/real
0.1495	0.1984	62	0.1401	3.7274	-5.5913	1.0	9.3186	-338.5644	-142.2943	-0.5586	-0.5198
0.1087	0.3968	124	0.1060	7.1500	-4.2208	1.0	11.3708	-324.8601	-108.0682	-0.6577	-1.0069
0.1056	0.5952	186	0.1046	7.2683	-5.6243	1.0	12.8927	-338.8952	-106.8850	-0.5520	-0.9362
0.1037	0.7936	248	0.1041	7.3329	-5.6913	1.0	13.0242	-339.5646	-106.2389	-0.5560	-0.9504
0.1041	0.992	310	0.1037	7.3755	-6.3330	1.0	13.7085	-345.9819	-105.8133	-0.5095	-0.9077
0.0976	1.1904	372	0.1035	7.4053	-7.5036	1.0	14.9089	-357.6875	-105.5148	-0.5378	-0.9621
0.1018	1.3888	434	0.1034	7.4118	-7.9940	1.0	15.4059	-362.5919	-105.4498	-0.5389	-0.9673
0.0991	1.5872	496	0.1031	7.4489	-6.9160	1.0	14.3649	-351.8115	-105.0788	-0.5154	-0.9266
0.0954	1.7856	558	0.1029	7.4703	-7.2607	1.0	14.7310	-355.2591	-104.8652	-0.5039	-0.9100
0.0995	1.984	620	0.1028	7.4973	-7.3534	1.0	14.8507	-356.1862	-104.5955	-0.5304	-0.9424
0.095	2.1824	682	0.1028	7.4894	-7.6075	1.0	15.0969	-358.7269	-104.6745	-0.5408	-0.9674
0.0964	2.3808	744	0.1027	7.4957	-7.5378	1.0	15.0335	-358.0298	-104.6114	-0.5268	-0.9397
0.1003	2.5792	806	0.1026	7.5077	-7.7372	1.0	15.2449	-360.0238	-104.4909	-0.5189	-0.9338
0.099	2.7776	868	0.1026	7.5125	-7.9795	1.0	15.4919	-362.4467	-104.4437	-0.5214	-0.9428
0.1053	2.976	930	0.1026	7.5091	-8.0379	1.0	15.5469	-363.0305	-104.4774	-0.5236	-0.9459