Edit model card

pairwise-reward-sft-zephyr-7b-sft-qlora-ultrafeedback-ultrafeedback-binarized-20241013-124646

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4739
  • Accuracy: 0.7592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1.5e-05
  • train_batch_size: 16
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.6209 0.0526 100 0.6427 0.6784
0.6346 0.1052 200 0.5829 0.7165
0.5945 0.1578 300 0.5333 0.7351
0.5258 0.2104 400 0.5169 0.7461
0.4914 0.2630 500 0.5209 0.7346
0.4995 0.3155 600 0.5056 0.7536
0.5272 0.3681 700 0.5041 0.7541
0.4993 0.4207 800 0.4943 0.7471
0.5317 0.4733 900 0.4970 0.7602
0.5193 0.5259 1000 0.4850 0.7597
0.4534 0.5785 1100 0.4931 0.7582
0.4828 0.6311 1200 0.4808 0.7582
0.5432 0.6837 1300 0.4836 0.7491
0.4343 0.7363 1400 0.4797 0.7582
0.4287 0.7889 1500 0.4794 0.7612
0.5117 0.8414 1600 0.4799 0.7587
0.4369 0.8940 1700 0.4770 0.7582
0.4537 0.9466 1800 0.4750 0.7566
0.451 0.9992 1900 0.4739 0.7592

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.20.0
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for sahandrez/pairwise-reward-sft-zephyr-7b-sft-qlora-ultrafeedback

Adapter
(1172)
this model