hugodk-sch's picture
End of training
21a5b07 verified
|
raw
history blame
4.91 kB
metadata
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
base_model: NbAiLab/nb-gpt-j-6B-v2
datasets:
  - hugodk-sch/aftonposten_title_prefs
model-index:
  - name: aftonposten-6b-align-scan
    results: []

aftonposten-6b-align-scan

This model is a fine-tuned version of data/ap-gpt-j-6b-sft-qlora-04-08 on the hugodk-sch/aftonposten_title_prefs dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5772
  • Rewards/chosen: 0.0684
  • Rewards/rejected: 0.0623
  • Rewards/accuracies: 0.5307
  • Rewards/margins: 0.0061
  • Logps/rejected: -37.4276
  • Logps/chosen: -33.9368
  • Logits/rejected: -2.2420
  • Logits/chosen: -2.2469

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.4711 0.26 100 -2.2401 -2.2352 -34.0113 -37.4979 0.5755 0.5195 0.0163 0.0032 0.0131
0.5061 0.52 200 -2.2385 -2.2337 -34.0500 -37.5455 0.5877 0.4992 -0.0108 0.0094 -0.0202
0.3371 0.78 300 -2.2371 -2.2322 -34.0344 -37.5353 0.5843 0.5278 0.0001 0.0132 -0.0131
0.4001 1.04 400 0.6350 -0.0073 0.0033 0.4838 -0.0106 -37.5120 -34.0450 -2.2353 -2.2402
0.3401 1.3 500 0.6238 -0.0135 -0.0193 0.5141 0.0058 -37.5443 -34.0539 -2.2353 -2.2402
0.433 1.56 600 0.6143 0.0129 0.0108 0.5245 0.0021 -37.5011 -34.0161 -2.2421 -2.2469
0.3298 1.82 700 0.5790 0.0633 0.0499 0.5195 0.0134 -37.4453 -33.9442 -2.2401 -2.2450
0.14 2.08 800 0.5904 0.0586 0.0544 0.5162 0.0041 -37.4389 -33.9509 -2.2423 -2.2472
0.2302 2.34 900 0.5758 0.0851 0.0740 0.5544 0.0111 -37.4109 -33.9130 -2.2448 -2.2497
0.2296 2.6 1000 0.5750 0.0631 0.0552 0.5075 0.0080 -37.4378 -33.9444 -2.2440 -2.2489
0.2798 2.86 1100 0.5483 0.0729 0.0545 0.5428 0.0184 -37.4387 -33.9303 -2.2419 -2.2468
0.1195 3.12 1200 0.5759 0.0672 0.0613 0.5137 0.0059 -37.4291 -33.9386 -2.2424 -2.2473
0.1371 3.38 1300 0.5592 0.0733 0.0574 0.5494 0.0159 -37.4346 -33.9299 -2.2434 -2.2483
0.0993 3.64 1400 0.6130 0.0546 0.0598 0.4871 -0.0053 -37.4311 -33.9566 -2.2422 -2.2471
0.18 3.9 1500 0.5566 0.0778 0.0602 0.5050 0.0176 -37.4306 -33.9234 -2.2423 -2.2472

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.1