GaetanMichelet's picture
Model save
4dbaa51 verified
|
raw
history blame
2.88 kB
metadata
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
library_name: peft
license: llama3.1
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: Llama-31-8B_task-3_180-samples_config-3
    results: []

Llama-31-8B_task-3_180-samples_config-3

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5206

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 150

Training results

Training Loss Epoch Step Validation Loss
2.3708 1.0 17 2.4982
2.4065 2.0 34 2.4397
2.3549 3.0 51 2.3147
2.0578 4.0 68 2.0850
1.8089 5.0 85 1.7080
1.3018 6.0 102 1.2347
1.0212 7.0 119 0.8016
0.4899 8.0 136 0.6475
0.6106 9.0 153 0.5890
0.5388 10.0 170 0.5729
0.7245 11.0 187 0.5585
0.3568 12.0 204 0.5533
0.4165 13.0 221 0.5353
0.6226 14.0 238 0.5420
0.3284 15.0 255 0.5026
0.4813 16.0 272 0.5214
0.3015 17.0 289 0.5116
0.3513 18.0 306 0.5071
0.3638 19.0 323 0.5486
0.5246 20.0 340 0.4813
0.4751 21.0 357 0.5369
0.2074 22.0 374 0.5177
0.2513 23.0 391 0.5109
0.3019 24.0 408 0.5100
0.2039 25.0 425 0.5429
0.228 26.0 442 0.5161
0.2127 27.0 459 0.5206

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1