Llama-31-8B_task-1_60-samples_config-4_full

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the GaetanMichelet/chat-60_ft_task-1 dataset. It achieves the following results on the evaluation set:

Loss: 0.9355

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 150

Training results

Training Loss	Epoch	Step	Validation Loss
2.5391	0.6957	2	2.4168
2.5182	1.7391	5	2.4065
2.4879	2.7826	8	2.3913
2.4947	3.8261	11	2.3720
2.4335	4.8696	14	2.3479
2.4424	5.9130	17	2.3109
2.3698	6.9565	20	2.2672
2.3512	8.0	23	2.2129
2.32	8.6957	25	2.1830
2.2555	9.7391	28	2.1266
2.1681	10.7826	31	2.0537
2.0737	11.8261	34	1.9880
2.0403	12.8696	37	1.9277
1.9476	13.9130	40	1.8711
1.9204	14.9565	43	1.8155
1.8461	16.0	46	1.7615
1.8095	16.6957	48	1.7236
1.7597	17.7391	51	1.6580
1.6484	18.7826	54	1.5919
1.6443	19.8261	57	1.5262
1.5204	20.8696	60	1.4561
1.463	21.9130	63	1.3960
1.3833	22.9565	66	1.3404
1.3385	24.0	69	1.2875
1.3094	24.6957	71	1.2504
1.2303	25.7391	74	1.2007
1.1677	26.7826	77	1.1600
1.1674	27.8261	80	1.1332
1.1068	28.8696	83	1.1100
1.104	29.9130	86	1.0884
1.0617	30.9565	89	1.0717
1.0354	32.0	92	1.0577
1.0195	32.6957	94	1.0499
1.0659	33.7391	97	1.0396
1.0118	34.7826	100	1.0310
1.0009	35.8261	103	1.0247
0.9938	36.8696	106	1.0181
0.9736	37.9130	109	1.0124
0.9888	38.9565	112	1.0076
0.9637	40.0	115	1.0019
0.9769	40.6957	117	0.9987
0.936	41.7391	120	0.9939
0.9863	42.7826	123	0.9906
0.9626	43.8261	126	0.9863
0.9438	44.8696	129	0.9825
0.9034	45.9130	132	0.9804
0.9398	46.9565	135	0.9763
0.9206	48.0	138	0.9740
0.9251	48.6957	140	0.9728
0.9245	49.7391	143	0.9704
0.9332	50.7826	146	0.9671
0.9012	51.8261	149	0.9651
0.9075	52.8696	152	0.9627
0.9031	53.9130	155	0.9614
0.8969	54.9565	158	0.9592
0.9102	56.0	161	0.9583
0.8955	56.6957	163	0.9563
0.8775	57.7391	166	0.9547
0.8879	58.7826	169	0.9540
0.8805	59.8261	172	0.9510
0.8982	60.8696	175	0.9505
0.8897	61.9130	178	0.9494
0.8515	62.9565	181	0.9479
0.8637	64.0	184	0.9469
0.8719	64.6957	186	0.9471
0.8635	65.7391	189	0.9452
0.8579	66.7826	192	0.9445
0.8465	67.8261	195	0.9434
0.8588	68.8696	198	0.9436
0.868	69.9130	201	0.9421
0.8523	70.9565	204	0.9418
0.8654	72.0	207	0.9404
0.8525	72.6957	209	0.9405
0.8565	73.7391	212	0.9400
0.8424	74.7826	215	0.9407
0.8342	75.8261	218	0.9395
0.8539	76.8696	221	0.9393
0.8413	77.9130	224	0.9383
0.8488	78.9565	227	0.9382
0.8319	80.0	230	0.9395
0.8402	80.6957	232	0.9382
0.8604	81.7391	235	0.9376
0.8516	82.7826	238	0.9374
0.8195	83.8261	241	0.9378
0.8456	84.8696	244	0.9381
0.8313	85.9130	247	0.9374
0.8415	86.9565	250	0.9369
0.8318	88.0	253	0.9365
0.8271	88.6957	255	0.9370
0.8361	89.7391	258	0.9364
0.8216	90.7826	261	0.9365
0.8387	91.8261	264	0.9366
0.8457	92.8696	267	0.9366
0.8491	93.9130	270	0.9367
0.8171	94.9565	273	0.9357
0.8168	96.0	276	0.9367
0.8161	96.6957	278	0.9364
0.8442	97.7391	281	0.9356
0.8388	98.7826	284	0.9363
0.8365	99.8261	287	0.9355
0.8493	100.8696	290	0.9360
0.8267	101.9130	293	0.9355
0.8304	102.9565	296	0.9361
0.8216	104.0	299	0.9361
0.8436	104.3478	300	0.9358

Framework versions

PEFT 0.12.0
Transformers 4.44.0
Pytorch 2.1.2+cu121
Datasets 2.20.0
Tokenizers 0.19.1

GaetanMichelet
/

Llama-31-8B_task-1_60-samples_config-4_full

Llama-31-8B_task-1_60-samples_config-4_full

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for GaetanMichelet/Llama-31-8B_task-1_60-samples_config-4_full

Collection including GaetanMichelet/Llama-31-8B_task-1_60-samples_config-4_full

Configurations choice

Evaluation results