--- license: other base_model: deepseek-ai/deepseek-llm-7b-chat tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - self-generate/ds_chat_original_cn_mining_oj_iter0-binarized - self-generate/ds_chat_original_cn_mining_sandbox_iter0-binarized - self-generate/ds_chat_original_cn_rl_oj_iter0-binarized model-index: - name: ds_chat_sigmoid_iter0_2024-09-14-21.15 results: [] --- [Visualize in Weights & Biases](https://ml.byteintl.net/experiment/tracking/detail?Id=project_20240915_20321b8f&selectedTrial=run_20240915_d060d7a7) # ds_chat_sigmoid_iter0_2024-09-14-21.15 This model is a fine-tuned version of [deepseek-ai/deepseek-llm-7b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat) on the self-generate/ds_chat_original_cn_mining_oj_iter0-binarized, the self-generate/ds_chat_original_cn_mining_sandbox_iter0-binarized and the self-generate/ds_chat_original_cn_rl_oj_iter0-binarized datasets. It achieves the following results on the evaluation set: - Loss: 0.7009 - Rewards/chosen: 0.3500 - Rewards/rejected: 0.0298 - Rewards/accuracies: 0.3289 - Rewards/margins: 0.3202 - Logps/rejected: -63.8274 - Logps/chosen: -122.4480 - Logits/rejected: 1.6952 - Logits/chosen: 1.6350 - Debug/policy Chosen Logits: 1.6350 - Debug/policy Rejected Logits: 1.6952 - Debug/policy Chosen Logps: -122.4480 - Debug/policy Rejected Logps: -63.8274 - Debug/reference Chosen Logps: -123.1481 - Debug/reference Rejected Logps: -63.8871 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-07 - train_batch_size: 8 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - total_train_batch_size: 64 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - lr_scheduler_warmup_steps: 100 - num_epochs: 8.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Debug/policy Chosen Logits | Debug/policy Rejected Logits | Debug/policy Chosen Logps | Debug/policy Rejected Logps | Debug/reference Chosen Logps | Debug/reference Rejected Logps | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------------------------:|:----------------------------:|:-------------------------:|:---------------------------:|:----------------------------:|:------------------------------:| | 0.6965 | 0.3623 | 100 | 0.6848 | 0.1614 | 0.0731 | 0.2895 | 0.0882 | -63.7408 | -122.8253 | 1.7215 | 1.6604 | 1.6604 | 1.7215 | -122.8253 | -63.7408 | -123.1481 | -63.8871 | | 0.7398 | 0.7246 | 200 | 0.7128 | 0.4980 | 0.1123 | 0.3289 | 0.3857 | -63.6625 | -122.1521 | 1.7105 | 1.6513 | 1.6513 | 1.7105 | -122.1521 | -63.6625 | -123.1481 | -63.8871 | | 0.7007 | 1.0870 | 300 | 0.6869 | 0.4063 | -0.0006 | 0.3158 | 0.4070 | -63.8883 | -122.3354 | 1.7138 | 1.6542 | 1.6542 | 1.7138 | -122.3354 | -63.8883 | -123.1481 | -63.8871 | | 0.7084 | 1.4493 | 400 | 0.7388 | 0.4329 | 0.1275 | 0.3026 | 0.3054 | -63.6320 | -122.2823 | 1.7009 | 1.6406 | 1.6406 | 1.7009 | -122.2823 | -63.6320 | -123.1481 | -63.8871 | | 0.693 | 1.8116 | 500 | 0.6927 | 0.1909 | -0.0563 | 0.3158 | 0.2472 | -63.9997 | -122.7663 | 1.7035 | 1.6431 | 1.6431 | 1.7035 | -122.7663 | -63.9997 | -123.1481 | -63.8871 | | 0.6683 | 2.1739 | 600 | 0.6755 | 0.2946 | 0.0203 | 0.3421 | 0.2744 | -63.8465 | -122.5588 | 1.7045 | 1.6442 | 1.6442 | 1.7045 | -122.5588 | -63.8465 | -123.1481 | -63.8871 | | 0.7035 | 2.5362 | 700 | 0.6899 | 0.1404 | -0.0287 | 0.3158 | 0.1691 | -63.9445 | -122.8673 | 1.7058 | 1.6448 | 1.6448 | 1.7058 | -122.8673 | -63.9445 | -123.1481 | -63.8871 | | 0.685 | 2.8986 | 800 | 0.6978 | 0.4321 | 0.0759 | 0.3947 | 0.3562 | -63.7352 | -122.2839 | 1.7109 | 1.6500 | 1.6500 | 1.7109 | -122.2839 | -63.7352 | -123.1481 | -63.8871 | | 0.6585 | 3.2609 | 900 | 0.7158 | 0.4197 | 0.1341 | 0.2763 | 0.2856 | -63.6189 | -122.3087 | 1.7148 | 1.6527 | 1.6527 | 1.7148 | -122.3087 | -63.6189 | -123.1481 | -63.8871 | | 0.6654 | 3.6232 | 1000 | 0.6837 | 0.4128 | 0.0010 | 0.3947 | 0.4118 | -63.8851 | -122.3225 | 1.7064 | 1.6460 | 1.6460 | 1.7064 | -122.3225 | -63.8851 | -123.1481 | -63.8871 | | 0.669 | 3.9855 | 1100 | 0.6801 | 0.2662 | -0.0151 | 0.3816 | 0.2813 | -63.9173 | -122.6156 | 1.7008 | 1.6413 | 1.6413 | 1.7008 | -122.6156 | -63.9173 | -123.1481 | -63.8871 | | 0.6658 | 4.3478 | 1200 | 0.6950 | 0.2165 | -0.0405 | 0.3553 | 0.2570 | -63.9680 | -122.7150 | 1.6985 | 1.6382 | 1.6382 | 1.6985 | -122.7150 | -63.9680 | -123.1481 | -63.8871 | | 0.6774 | 4.7101 | 1300 | 0.6833 | 0.3216 | 0.0373 | 0.3289 | 0.2843 | -63.8124 | -122.5048 | 1.6956 | 1.6371 | 1.6371 | 1.6956 | -122.5048 | -63.8124 | -123.1481 | -63.8871 | | 0.6553 | 5.0725 | 1400 | 0.6871 | 0.4489 | 0.0096 | 0.3421 | 0.4393 | -63.8679 | -122.2503 | 1.6926 | 1.6324 | 1.6324 | 1.6926 | -122.2503 | -63.8679 | -123.1481 | -63.8871 | | 0.655 | 5.4348 | 1500 | 0.6900 | 0.3867 | 0.0004 | 0.3553 | 0.3863 | -63.8863 | -122.3746 | 1.7037 | 1.6446 | 1.6446 | 1.7037 | -122.3746 | -63.8863 | -123.1481 | -63.8871 | | 0.6552 | 5.7971 | 1600 | 0.6981 | 0.2816 | -0.0683 | 0.3158 | 0.3498 | -64.0236 | -122.5849 | 1.6935 | 1.6342 | 1.6342 | 1.6935 | -122.5849 | -64.0236 | -123.1481 | -63.8871 | | 0.6471 | 6.1594 | 1700 | 0.7017 | 0.3683 | 0.0204 | 0.3553 | 0.3479 | -63.8463 | -122.4115 | 1.6992 | 1.6385 | 1.6385 | 1.6992 | -122.4115 | -63.8463 | -123.1481 | -63.8871 | | 0.6557 | 6.5217 | 1800 | 0.6957 | 0.2688 | -0.0975 | 0.3026 | 0.3663 | -64.0820 | -122.6105 | 1.6947 | 1.6337 | 1.6337 | 1.6947 | -122.6105 | -64.0820 | -123.1481 | -63.8871 | | 0.6516 | 6.8841 | 1900 | 0.6872 | 0.3905 | 0.0084 | 0.3553 | 0.3821 | -63.8704 | -122.3671 | 1.7002 | 1.6400 | 1.6400 | 1.7002 | -122.3671 | -63.8704 | -123.1481 | -63.8871 | | 0.6542 | 7.2464 | 2000 | 0.6910 | 0.3410 | 0.0003 | 0.3289 | 0.3406 | -63.8864 | -122.4661 | 1.6915 | 1.6320 | 1.6320 | 1.6915 | -122.4661 | -63.8864 | -123.1481 | -63.8871 | | 0.6629 | 7.6087 | 2100 | 0.6930 | 0.4245 | 0.0306 | 0.3026 | 0.3939 | -63.8259 | -122.2991 | 1.6968 | 1.6376 | 1.6376 | 1.6968 | -122.2991 | -63.8259 | -123.1481 | -63.8871 | | 0.6427 | 7.9710 | 2200 | 0.7009 | 0.3500 | 0.0298 | 0.3289 | 0.3202 | -63.8274 | -122.4480 | 1.6952 | 1.6350 | 1.6350 | 1.6952 | -122.4480 | -63.8274 | -123.1481 | -63.8871 | ### Framework versions - Transformers 4.42.0 - Pytorch 2.3.0+cu121 - Datasets 2.14.6 - Tokenizers 0.19.1