qwen_unl_entropy_0_0

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 1.6479
Rewards/chosen: -1.3032
Rewards/rejected: -1.4993
Rewards/accuracies: 0.5712
Rewards/margins: 0.1961
Logps/rejected: -1.4993
Logps/chosen: -1.3032
Logits/rejected: 0.1464
Logits/chosen: 0.0748

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
1.6555	0.2141	400	1.6941	-1.3383	-1.4640	0.5556	0.1257	-1.4640	-1.3383	0.4030	0.3137
1.6693	0.4282	800	1.6719	-1.3149	-1.4532	0.5579	0.1383	-1.4532	-1.3149	0.3441	0.2642
1.6204	0.6422	1200	1.6640	-1.3085	-1.4525	0.5556	0.1440	-1.4525	-1.3085	0.3559	0.2746
1.6569	0.8563	1600	1.6598	-1.3094	-1.4585	0.5593	0.1491	-1.4585	-1.3094	0.2618	0.1878
1.7111	1.0704	2000	1.6548	-1.3002	-1.4570	0.5653	0.1568	-1.4570	-1.3002	0.2290	0.1561
1.6123	1.2845	2400	1.6522	-1.3029	-1.4741	0.5675	0.1711	-1.4741	-1.3029	0.2729	0.1950
1.6687	1.4986	2800	1.6488	-1.3000	-1.4737	0.5697	0.1738	-1.4737	-1.3000	0.1754	0.1051
1.6012	1.7127	3200	1.6494	-1.3010	-1.4718	0.5675	0.1708	-1.4718	-1.3010	0.1848	0.1133
1.5646	1.9267	3600	1.6479	-1.2987	-1.4776	0.5682	0.1789	-1.4776	-1.2987	0.1466	0.0770
1.5351	2.1408	4000	1.6470	-1.3020	-1.4960	0.5697	0.1940	-1.4960	-1.3020	0.1418	0.0714
1.5309	2.3549	4400	1.6467	-1.3051	-1.5042	0.5727	0.1991	-1.5042	-1.3051	0.1132	0.0439
1.5444	2.5690	4800	1.6473	-1.3034	-1.5014	0.5720	0.1979	-1.5014	-1.3034	0.1403	0.0690
1.5671	2.7831	5200	1.6474	-1.3030	-1.4996	0.5705	0.1966	-1.4996	-1.3030	0.2002	0.1244
1.5485	2.9972	5600	1.6479	-1.3031	-1.4993	0.5712	0.1961	-1.4993	-1.3031	0.1464	0.0748

Framework versions

Transformers 4.44.2
Pytorch 2.2.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1

yakazimir
/

qwen_unl_entropy_0_0

qwen_unl_entropy_0_0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for yakazimir/qwen_unl_entropy_0_0

Dataset used to train yakazimir/qwen_unl_entropy_0_0

Evaluation results