llama_3b_step2_batch_v2

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3132

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 2
  • eval_batch_size: 40
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss
0.993 0.0341 50 1.1011
1.0449 0.0682 100 0.9752
0.9894 0.1023 150 0.8698
0.6199 0.1364 200 0.7913
0.5326 0.1704 250 0.7341
0.8109 0.2045 300 0.6799
0.7554 0.2386 350 0.6332
0.9877 0.2727 400 0.5993
0.3571 0.3068 450 0.5726
0.4539 0.3409 500 0.5439
0.464 0.3750 550 0.5147
0.4051 0.4091 600 0.4904
0.5371 0.4432 650 0.4732
0.4954 0.4772 700 0.4549
0.4594 0.5113 750 0.4399
0.4755 0.5454 800 0.4281
0.2948 0.5795 850 0.4118
0.3699 0.6136 900 0.4021
0.319 0.6477 950 0.3927
0.3359 0.6818 1000 0.3802
0.4056 0.7159 1050 0.3746
0.2975 0.7500 1100 0.3643
0.3868 0.7840 1150 0.3596
0.3485 0.8181 1200 0.3512
0.3546 0.8522 1250 0.3476
0.3697 0.8863 1300 0.3416
0.4056 0.9204 1350 0.3388
0.3189 0.9545 1400 0.3332
0.4173 0.9886 1450 0.3286
0.1779 1.0228 1500 0.3338
0.2877 1.0569 1550 0.3300
0.1506 1.0910 1600 0.3301
0.2075 1.1251 1650 0.3289
0.1956 1.1592 1700 0.3271
0.162 1.1933 1750 0.3276
0.2416 1.2274 1800 0.3228
0.2364 1.2615 1850 0.3243
0.1602 1.2956 1900 0.3219
0.1566 1.3296 1950 0.3211
0.1784 1.3637 2000 0.3215
0.1627 1.3978 2050 0.3190
0.1907 1.4319 2100 0.3183
0.1182 1.4660 2150 0.3183
0.1585 1.5001 2200 0.3179
0.2261 1.5342 2250 0.3158
0.1457 1.5683 2300 0.3150
0.2589 1.6024 2350 0.3146
0.2253 1.6364 2400 0.3144
0.1741 1.6705 2450 0.3143
0.1477 1.7046 2500 0.3141
0.1886 1.7387 2550 0.3141
0.2211 1.7728 2600 0.3139
0.238 1.8069 2650 0.3138
0.2863 1.8410 2700 0.3137
0.2603 1.8751 2750 0.3135
0.2484 1.9092 2800 0.3133
0.2343 1.9432 2850 0.3132
0.254 1.9773 2900 0.3132

Framework versions

  • Transformers 4.46.1
  • Pytorch 2.1.0+cu118
  • Datasets 3.0.2
  • Tokenizers 0.20.1
Downloads last month
8
Safetensors
Model size
3.21B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.