SleepyGorilla's picture
SleepyGorilla/Mistral_7B
c782460 verified
|
raw
history blame
No virus
2.89 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: TheBloke/OpenHermes-2-Mistral-7B-GPTQ
model-index:
  - name: openhermes-mistral-dpo-gptq
    results: []

openhermes-mistral-dpo-gptq

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4618
  • Rewards/chosen: -0.7871
  • Rewards/rejected: -6.6095
  • Rewards/accuracies: 0.9375
  • Rewards/margins: 5.8223
  • Logps/rejected: -220.7533
  • Logps/chosen: -104.2417
  • Logits/rejected: -1.9929
  • Logits/chosen: -2.4654

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5633 0.0 10 0.5184 0.0815 -0.3179 1.0 0.3995 -157.8381 -95.5550 -2.1002 -2.5591
0.3936 0.0 20 0.2875 0.2689 -1.1716 1.0 1.4404 -166.3746 -93.6817 -2.0675 -2.5604
0.2132 0.0 30 0.2000 0.0966 -2.8012 0.9375 2.8977 -182.6702 -95.4047 -2.0286 -2.5520
0.034 0.0 40 0.4078 -0.5324 -5.2327 0.9375 4.7003 -206.9856 -101.6947 -2.0143 -2.4945
0.0665 0.0 50 0.4618 -0.7871 -6.6095 0.9375 5.8223 -220.7533 -104.2417 -1.9929 -2.4654

Framework versions

  • PEFT 0.9.0
  • Transformers 4.38.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2