19Leo97's picture
19Leo97/openhermes-mistral-dpo-gptq
8d8942e verified
metadata
license: apache-2.0
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: TheBloke/OpenHermes-2-Mistral-7B-GPTQ
model-index:
  - name: openhermes-mistral-dpo-gptq
    results: []

openhermes-mistral-dpo-gptq

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7067
  • Rewards/chosen: 0.0088
  • Rewards/rejected: -0.0947
  • Rewards/accuracies: 0.625
  • Rewards/margins: 0.1035
  • Logps/rejected: -172.7847
  • Logps/chosen: -98.3108
  • Logits/rejected: -2.0623
  • Logits/chosen: -1.9279

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6744 0.005 10 0.6970 -0.0186 -0.0266 0.6875 0.0080 -172.1035 -98.5849 -2.0765 -1.9425
0.7073 0.01 20 0.7152 -0.0388 -0.0448 0.4375 0.0060 -172.2850 -98.7869 -2.0706 -1.9392
0.7287 0.015 30 0.7197 0.0026 -0.0203 0.625 0.0230 -172.0406 -98.3726 -2.0688 -1.9317
0.701 0.02 40 0.7120 0.0131 -0.0600 0.625 0.0731 -172.4374 -98.2679 -2.0641 -1.9302
0.6726 0.025 50 0.7067 0.0088 -0.0947 0.625 0.1035 -172.7847 -98.3108 -2.0623 -1.9279

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.1
  • Pytorch 2.0.1+cu117
  • Datasets 2.19.2
  • Tokenizers 0.19.1