metadata

license: mit
tags:
  - generated_from_trainer
base_model: HuggingFaceH4/mistral-7b-sft-beta
model-index:
  - name: zephyr-7b-dpo-full-beta-0.2
    results: []

zephyr-7b-dpo-full-beta-0.2

This model is a fine-tuned version of HuggingFaceH4/mistral-7b-sft-beta on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.7903
Rewards/chosen: -3.2220
Rewards/rejected: -7.3367
Rewards/accuracies: 0.7659
Rewards/margins: 4.1147
Logps/rejected: -282.6258
Logps/chosen: -314.5996
Logits/rejected: -2.6943
Logits/chosen: -2.6970

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5631	0.26	500	0.5260	0.0288	-1.2082	0.75	1.2371	-251.9833	-298.3453	-2.9467	-2.9577
0.5432	0.52	1000	0.5888	-0.0335	-1.8482	0.7540	1.8147	-255.1831	-298.6568	-2.8465	-2.8476
0.5368	0.77	1500	0.5860	-0.4836	-2.3300	0.7619	1.8464	-257.5920	-300.9073	-2.8455	-2.8445
0.0615	1.03	2000	0.6024	-0.5971	-2.6919	0.7778	2.0948	-259.4018	-301.4749	-2.8687	-2.8639
0.0817	1.29	2500	0.6655	-1.3554	-3.8426	0.7738	2.4872	-265.1552	-305.2667	-2.8257	-2.8254
0.0617	1.55	3000	0.6421	-1.2552	-3.7613	0.75	2.5062	-264.7488	-304.7651	-2.7744	-2.7683
0.0765	1.81	3500	0.6582	-1.1492	-4.0394	0.7659	2.8902	-266.1391	-304.2354	-2.7403	-2.7389
0.0178	2.07	4000	0.6797	-1.8485	-5.2549	0.7619	3.4064	-272.2166	-307.7317	-2.7310	-2.7273
0.0165	2.32	4500	0.7359	-2.2096	-6.0498	0.7817	3.8401	-276.1910	-309.5376	-2.7006	-2.7001
0.0094	2.58	5000	0.7864	-2.8828	-6.8542	0.7738	3.9713	-280.2130	-312.9036	-2.7185	-2.7196
0.0094	2.84	5500	0.7953	-3.1897	-7.3009	0.7579	4.1112	-282.4464	-314.4378	-2.6987	-2.7012

Framework versions

Transformers 4.35.0
Pytorch 2.1.0+cu118
Datasets 2.14.6
Tokenizers 0.14.1

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	61.55
AI2 Reasoning Challenge (25-Shot)	61.77
HellaSwag (10-Shot)	84.04
MMLU (5-Shot)	61.79
TruthfulQA (0-shot)	54.72
Winogrande (5-shot)	76.95
GSM8k (5-shot)	30.02