results

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.7838
Rewards/chosen: -0.0726
Rewards/rejected: -0.1414
Rewards/accuracies: 1.0
Rewards/margins: 0.0688
Logps/rejected: -1.4145
Logps/chosen: -0.7263
Logits/rejected: -1.3572
Logits/chosen: -1.0579
Nll Loss: 0.7279
Log Odds Ratio: -0.3123
Log Odds Chosen: 1.0916

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 16
total_eval_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 5
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Nll Loss	Log Odds Ratio	Log Odds Chosen
4.0477	1.6	10	2.6148	-0.1668	-0.2037	0.8333	0.0369	-2.0366	-1.6676	-0.5541	-0.3265	2.5641	-0.5002	0.4474
1.7128	3.2	20	1.3152	-0.1092	-0.1512	0.8333	0.0421	-1.5124	-1.0917	-1.2255	-0.9402	1.2267	-0.4566	0.5915
0.9601	4.8	30	0.9698	-0.0833	-0.1380	1.0	0.0547	-1.3800	-0.8326	-1.2364	-0.9499	0.8983	-0.3832	0.8390
0.7231	6.4	40	0.8362	-0.0752	-0.1390	1.0	0.0638	-1.3898	-0.7521	-1.3672	-1.0683	0.7749	-0.3345	1.0067
0.6324	8.0	50	0.7904	-0.0729	-0.1410	1.0	0.0681	-1.4101	-0.7290	-1.3658	-1.0673	0.7331	-0.3152	1.0809
0.6228	9.6	60	0.7838	-0.0726	-0.1414	1.0	0.0688	-1.4145	-0.7263	-1.3572	-1.0579	0.7279	-0.3123	1.0916

Framework versions

Transformers 4.44.2
Pytorch 2.2.0+cu121
Datasets 3.0.0
Tokenizers 0.19.1

Rakuto
/

results

results

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Rakuto/results

Evaluation results