Edit model card

llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_random_1024_r_64_alpha_16

This model is a fine-tuned version of dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6788
  • Rewards/chosen: -0.0760
  • Rewards/rejected: -0.1428
  • Rewards/accuracies: 0.5781
  • Rewards/margins: 0.0669
  • Logps/rejected: -202.0682
  • Logps/chosen: -199.2469
  • Logits/rejected: 1.0323
  • Logits/chosen: 1.0541

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6913 0.1 19 0.6845 -0.4006 -0.4672 0.5558 0.0665 -205.3114 -202.4936 1.0265 1.0467
0.6768 0.21 38 0.6796 -0.3409 -0.4196 0.5603 0.0787 -204.8360 -201.8965 1.0326 1.0538
0.6771 0.31 57 0.6788 -0.0760 -0.1428 0.5781 0.0669 -202.0682 -199.2469 1.0323 1.0541
0.6665 0.41 76 0.6826 -0.1511 -0.2355 0.5703 0.0843 -202.9944 -199.9986 1.0413 1.0635
0.6669 0.52 95 0.6830 -0.1285 -0.2165 0.5781 0.0880 -202.8050 -199.7720 1.0299 1.0522
0.669 0.62 114 0.6800 -0.0932 -0.1803 0.5725 0.0871 -202.4429 -199.4187 1.0126 1.0352
0.6559 0.72 133 0.6829 -0.0011 -0.1074 0.5759 0.1063 -201.7135 -198.4980 1.0015 1.0232
0.6698 0.83 152 0.6810 -0.0519 -0.1530 0.5781 0.1011 -202.1696 -199.0062 0.9974 1.0192
0.6643 0.93 171 0.6799 -0.0579 -0.1589 0.5658 0.1010 -202.2284 -199.0658 1.0002 1.0220

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.4
  • Tokenizers 0.13.3
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Model tree for dhmeltzer/llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_random_1024_r_64_alpha_16