Llama-3.1-8B-Instruct-EI1-120K-fix-32gpus-20ep

This model is a fine-tuned version of NousResearch/Meta-Llama-3.1-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.6790

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 6e-06
train_batch_size: 2
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 32
total_train_batch_size: 64
total_eval_batch_size: 256
optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 20.0

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.2924	100	0.5265
No log	0.5848	200	0.4605
No log	0.8772	300	0.4265
No log	1.1696	400	0.4117
0.4742	1.4620	500	0.4032
0.4742	1.7544	600	0.3976
0.4742	2.0468	700	0.4008
0.4742	2.3392	800	0.4005
0.4742	2.6316	900	0.3961
0.3557	2.9240	1000	0.3943
0.3557	3.2164	1100	0.4090
0.3557	3.5088	1200	0.4074
0.3557	3.8012	1300	0.4064
0.3557	4.0936	1400	0.4312
0.303	4.3860	1500	0.4329
0.303	4.6784	1600	0.4324
0.303	4.9708	1700	0.4301
0.303	5.2632	1800	0.4761
0.303	5.5556	1900	0.4755
0.2542	5.8480	2000	0.4737
0.2542	6.1404	2100	0.5378
0.2542	6.4327	2200	0.5374
0.2542	6.7251	2300	0.5393
0.2542	7.0175	2400	0.6218
0.1892	7.3099	2500	0.6207
0.1892	7.6023	2600	0.6277
0.1892	7.8947	2700	0.6202
0.1892	8.1871	2800	0.7137
0.1892	8.4795	2900	0.7203
0.1318	8.7719	3000	0.7195
0.1318	9.0643	3100	0.8267
0.1318	9.3567	3200	0.8213
0.1318	9.6491	3300	0.8221
0.1318	9.9415	3400	0.8276
0.0824	10.2339	3500	0.9402
0.0824	10.5263	3600	0.9379
0.0824	10.8187	3700	0.9340
0.0824	11.1111	3800	1.0448
0.0824	11.4035	3900	1.0511
0.0483	11.6959	4000	1.0520
0.0483	11.9883	4100	1.0641
0.0483	12.2807	4200	1.1640
0.0483	12.5731	4300	1.1574
0.0483	12.8655	4400	1.1667
0.0294	13.1579	4500	1.2525
0.0294	13.4503	4600	1.2659
0.0294	13.7427	4700	1.2635
0.0294	14.0351	4800	1.3617
0.0294	14.3275	4900	1.3559
0.0195	14.6199	5000	1.3651
0.0195	14.9123	5100	1.3715
0.0195	15.2047	5200	1.4419
0.0195	15.4971	5300	1.4471
0.0195	15.7895	5400	1.4583
0.0152	16.0819	5500	1.5293
0.0152	16.3743	5600	1.5350
0.0152	16.6667	5700	1.5373
0.0152	16.9591	5800	1.5497
0.0152	17.2515	5900	1.6156
0.0124	17.5439	6000	1.6219
0.0124	17.8363	6100	1.6184
0.0124	18.1287	6200	1.6552
0.0124	18.4211	6300	1.6616
0.0124	18.7135	6400	1.6637
0.0108	19.0058	6500	1.6645
0.0108	19.2982	6600	1.6776
0.0108	19.5906	6700	1.6790
0.0108	19.8830	6800	1.6790

Framework versions

Transformers 4.43.4
Pytorch 2.4.0+cu121
Datasets 3.0.1
Tokenizers 0.19.1

qfq
/

Llama-3.1-8B-Instruct-EI1-120K-fix-32gpus-20ep

Llama-3.1-8B-Instruct-EI1-120K-fix-32gpus-20ep

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for qfq/Llama-3.1-8B-Instruct-EI1-120K-fix-32gpus-20ep

Evaluation results