metadata

license: apache-2.0
base_model: Deci/DeciLM-7B
tags:
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrachat_200k
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: bbdeci7b-sft-lora-dpo-lora
    results: []

bbdeci7b-sft-lora-dpo-lora

This model is a SFT then DPO fine-tuned version of Deci/DeciLM-7B on the HuggingFaceH4/ultrachat_200k for SFT and the HuggingFaceH4/ultrafeedback_binarized

Evals and more details coming soon

SFT was conducted on 2X Nvidia A100 for 21 Hours, and DPO was codnucted on 8X Nvida A100 for 4 Hours

It achieves the following results on the evaluation set(SFT):

Loss: 1.0110

It achieves the following results on the evaluation set(DPO):

Loss: 0.5908
Rewards/chosen: 0.0960
Rewards/rejected: -0.2480
Rewards/accuracies: 0.7222
Rewards/margins: 0.3440
Logps/rejected: -241.9212
Logps/chosen: -295.2642
Logits/rejected: -2.6769
Logits/chosen: -2.6941

Training hyperparameters

The following hyperparameters were used during SFT training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 128
total_train_batch_size: 1024
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 1

The following hyperparameters were used during DPO training:

learning_rate: 5e-07
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 32
total_train_batch_size: 512
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

SFT:

Training Loss	Epoch	Step	Validation Loss
1.0062	1.00	136	1.0110

DPO:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6401	1.0	121	0.6354	0.0634	-0.0940	0.7302	0.1573	-240.3806	-295.5903	-2.6840	-2.7020
0.6014	2.0	242	0.5988	0.0861	-0.2096	0.7460	0.2956	-241.5365	-295.3633	-2.6795	-2.6965
0.5911	3.0	363	0.5908	0.0960	-0.2480	0.7222	0.3440	-241.9212	-295.2642	-2.6769	-2.6941

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu118
Datasets 2.14.6
Tokenizers 0.14.1