hugodk-sch's picture
End of training
b19fa79 verified
|
raw
history blame
4.94 kB
metadata
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
base_model: NorLLM-AI/NorMistral-7B
datasets:
  - hugodk-sch/aftonposten_title_prefs
model-index:
  - name: norllm-ai-normistral-7b-align-scan
    results: []

norllm-ai-normistral-7b-align-scan

This model is a fine-tuned version of data/norllm-ai-normistral-7b-sft-qlora on the hugodk-sch/aftonposten_title_prefs dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8067
  • Rewards/chosen: -1.1692
  • Rewards/rejected: -1.5184
  • Rewards/accuracies: 0.5918
  • Rewards/margins: 0.3492
  • Logps/rejected: -37.2289
  • Logps/chosen: -33.2312
  • Logits/rejected: -2.8266
  • Logits/chosen: -2.8291

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6746 0.26 100 0.6828 0.0185 -0.0185 0.5694 0.0370 -34.7290 -31.2516 -2.8058 -2.8084
0.6195 0.52 200 0.6735 -0.0458 -0.1322 0.5511 0.0864 -34.9185 -31.3587 -2.8176 -2.8201
0.5567 0.78 300 0.6810 -0.1233 -0.2426 0.5723 0.1192 -35.1024 -31.4880 -2.8203 -2.8231
0.2251 1.04 400 0.6779 -0.3249 -0.4970 0.6013 0.1720 -35.5264 -31.8240 -2.8175 -2.8204
0.2082 1.3 500 0.6859 -0.4136 -0.6723 0.6092 0.2587 -35.8186 -31.9717 -2.8475 -2.8487
0.2119 1.56 600 0.6993 -0.5421 -0.7899 0.5926 0.2478 -36.0147 -32.1860 -2.8301 -2.8322
0.1579 1.82 700 0.7178 -0.6062 -0.8251 0.5806 0.2189 -36.0734 -32.2928 -2.8261 -2.8284
0.0649 2.08 800 0.7260 -0.7190 -1.0000 0.6071 0.2810 -36.3648 -32.4808 -2.8243 -2.8271
0.1014 2.34 900 0.7758 -1.0050 -1.3365 0.5831 0.3315 -36.9256 -32.9574 -2.8278 -2.8304
0.0425 2.6 1000 0.7952 -1.0994 -1.4459 0.5826 0.3465 -37.1080 -33.1148 -2.8238 -2.8267
0.0878 2.86 1100 0.7929 -1.0931 -1.4389 0.5889 0.3458 -37.0962 -33.1042 -2.8257 -2.8283
0.0534 3.12 1200 0.7997 -1.1321 -1.4857 0.5889 0.3535 -37.1742 -33.1693 -2.8258 -2.8285
0.035 3.38 1300 0.8024 -1.1445 -1.5019 0.5889 0.3575 -37.2014 -33.1899 -2.8266 -2.8291
0.0126 3.64 1400 0.8126 -1.1630 -1.5088 0.5860 0.3457 -37.2128 -33.2208 -2.8267 -2.8294
0.0525 3.9 1500 0.8088 -1.1685 -1.5136 0.5918 0.3451 -37.2208 -33.2299 -2.8265 -2.8292

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.1