Llama-3.1-8B-Instruct-EI1-120K-fix-32gpus-20ep

This model is a fine-tuned version of NousResearch/Meta-Llama-3.1-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6790

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 6e-06
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 32
  • total_train_batch_size: 64
  • total_eval_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss
No log 0.2924 100 0.5265
No log 0.5848 200 0.4605
No log 0.8772 300 0.4265
No log 1.1696 400 0.4117
0.4742 1.4620 500 0.4032
0.4742 1.7544 600 0.3976
0.4742 2.0468 700 0.4008
0.4742 2.3392 800 0.4005
0.4742 2.6316 900 0.3961
0.3557 2.9240 1000 0.3943
0.3557 3.2164 1100 0.4090
0.3557 3.5088 1200 0.4074
0.3557 3.8012 1300 0.4064
0.3557 4.0936 1400 0.4312
0.303 4.3860 1500 0.4329
0.303 4.6784 1600 0.4324
0.303 4.9708 1700 0.4301
0.303 5.2632 1800 0.4761
0.303 5.5556 1900 0.4755
0.2542 5.8480 2000 0.4737
0.2542 6.1404 2100 0.5378
0.2542 6.4327 2200 0.5374
0.2542 6.7251 2300 0.5393
0.2542 7.0175 2400 0.6218
0.1892 7.3099 2500 0.6207
0.1892 7.6023 2600 0.6277
0.1892 7.8947 2700 0.6202
0.1892 8.1871 2800 0.7137
0.1892 8.4795 2900 0.7203
0.1318 8.7719 3000 0.7195
0.1318 9.0643 3100 0.8267
0.1318 9.3567 3200 0.8213
0.1318 9.6491 3300 0.8221
0.1318 9.9415 3400 0.8276
0.0824 10.2339 3500 0.9402
0.0824 10.5263 3600 0.9379
0.0824 10.8187 3700 0.9340
0.0824 11.1111 3800 1.0448
0.0824 11.4035 3900 1.0511
0.0483 11.6959 4000 1.0520
0.0483 11.9883 4100 1.0641
0.0483 12.2807 4200 1.1640
0.0483 12.5731 4300 1.1574
0.0483 12.8655 4400 1.1667
0.0294 13.1579 4500 1.2525
0.0294 13.4503 4600 1.2659
0.0294 13.7427 4700 1.2635
0.0294 14.0351 4800 1.3617
0.0294 14.3275 4900 1.3559
0.0195 14.6199 5000 1.3651
0.0195 14.9123 5100 1.3715
0.0195 15.2047 5200 1.4419
0.0195 15.4971 5300 1.4471
0.0195 15.7895 5400 1.4583
0.0152 16.0819 5500 1.5293
0.0152 16.3743 5600 1.5350
0.0152 16.6667 5700 1.5373
0.0152 16.9591 5800 1.5497
0.0152 17.2515 5900 1.6156
0.0124 17.5439 6000 1.6219
0.0124 17.8363 6100 1.6184
0.0124 18.1287 6200 1.6552
0.0124 18.4211 6300 1.6616
0.0124 18.7135 6400 1.6637
0.0108 19.0058 6500 1.6645
0.0108 19.2982 6600 1.6776
0.0108 19.5906 6700 1.6790
0.0108 19.8830 6800 1.6790

Framework versions

  • Transformers 4.43.4
  • Pytorch 2.4.0+cu121
  • Datasets 3.0.1
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
8.03B params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for qfq/Llama-3.1-8B-Instruct-EI1-120K-fix-32gpus-20ep

Finetuned
(32)
this model