MPT_1000_STEPS_1e7_rate_03_beta_DPO

This model is a fine-tuned version of mosaicml/mpt-7b-instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6919
Rewards/chosen: -0.0230
Rewards/rejected: -0.0291
Rewards/accuracies: 0.5275
Rewards/margins: 0.0061
Logps/rejected: -21.6156
Logps/chosen: -20.8382
Logits/rejected: 14.2213
Logits/chosen: 14.2239

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-07
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6958	0.05	50	0.6969	-0.0103	-0.0064	0.4791	-0.0040	-21.5702	-20.8128	14.2683	14.2709
0.6948	0.1	100	0.6966	-0.0023	0.0014	0.5077	-0.0037	-21.5546	-20.7968	14.2571	14.2597
0.6971	0.15	150	0.7007	-0.0051	0.0067	0.4681	-0.0117	-21.5441	-20.8024	14.2475	14.2501
0.6891	0.2	200	0.6943	0.0187	0.0174	0.4923	0.0013	-21.5227	-20.7548	14.2452	14.2478
0.6906	0.24	250	0.6922	0.0036	-0.0018	0.4747	0.0054	-21.5609	-20.7850	14.2395	14.2421
0.6865	0.29	300	0.6942	0.0038	0.0023	0.4857	0.0015	-21.5528	-20.7845	14.2393	14.2419
0.7058	0.34	350	0.6939	-0.0025	-0.0045	0.5055	0.0020	-21.5664	-20.7971	14.2533	14.2559
0.6817	0.39	400	0.6918	-0.0255	-0.0318	0.5143	0.0063	-21.6210	-20.8431	14.2343	14.2369
0.6726	0.44	450	0.6902	-0.0203	-0.0301	0.5582	0.0099	-21.6177	-20.8327	14.2287	14.2313
0.6927	0.49	500	0.6903	-0.0159	-0.0254	0.5209	0.0096	-21.6083	-20.8239	14.2329	14.2355
0.6728	0.54	550	0.6905	-0.0252	-0.0342	0.5297	0.0089	-21.6258	-20.8426	14.2305	14.2331
0.6733	0.59	600	0.6877	-0.0158	-0.0305	0.5341	0.0147	-21.6184	-20.8237	14.2330	14.2356
0.6937	0.64	650	0.6916	-0.0222	-0.0293	0.5341	0.0071	-21.6161	-20.8365	14.2242	14.2268
0.6771	0.68	700	0.6921	-0.0234	-0.0294	0.5231	0.0060	-21.6163	-20.8391	14.2289	14.2315
0.6874	0.73	750	0.6916	-0.0219	-0.0286	0.5121	0.0067	-21.6147	-20.8361	14.2292	14.2317
0.6772	0.78	800	0.6888	-0.0187	-0.0313	0.5473	0.0127	-21.6201	-20.8295	14.2308	14.2334
0.7033	0.83	850	0.6886	-0.0163	-0.0294	0.5297	0.0131	-21.6163	-20.8248	14.2220	14.2245
0.6772	0.88	900	0.6894	-0.0217	-0.0330	0.5297	0.0113	-21.6235	-20.8357	14.2227	14.2253
0.696	0.93	950	0.6918	-0.0229	-0.0293	0.5275	0.0064	-21.6160	-20.8380	14.2213	14.2239
0.6881	0.98	1000	0.6919	-0.0230	-0.0291	0.5275	0.0061	-21.6156	-20.8382	14.2213	14.2239

Framework versions

Transformers 4.39.1
Pytorch 2.0.0+cu117
Datasets 2.18.0
Tokenizers 0.15.2

tsavage68
/

MPT_1000_STEPS_1e7_rate_05_beta_DPO

MPT_1000_STEPS_1e7_rate_03_beta_DPO

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tsavage68/MPT_1000_STEPS_1e7_rate_05_beta_DPO

Evaluation results