--- library_name: transformers license: llama3 base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT tags: - trl - dpo - generated_from_trainer model-index: - name: IE_L3_1000steps_1e8rate_05beta_cSFTDPO results: [] --- # IE_L3_1000steps_1e8rate_05beta_cSFTDPO This model is a fine-tuned version of [tsavage68/IE_L3_1000steps_1e6rate_SFT](https://huggingface.co/tsavage68/IE_L3_1000steps_1e6rate_SFT) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.6901 - Rewards/chosen: -0.0305 - Rewards/rejected: -0.0517 - Rewards/accuracies: 0.4200 - Rewards/margins: 0.0213 - Logps/rejected: -75.7307 - Logps/chosen: -82.8587 - Logits/rejected: -0.7970 - Logits/chosen: -0.7401 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-08 - train_batch_size: 2 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6994 | 0.4 | 50 | 0.7013 | -0.0193 | -0.0168 | 0.375 | -0.0025 | -75.6609 | -82.8363 | -0.7968 | -0.7397 | | 0.7002 | 0.8 | 100 | 0.7038 | -0.0158 | -0.0084 | 0.3450 | -0.0074 | -75.6441 | -82.8293 | -0.7971 | -0.7401 | | 0.6907 | 1.2 | 150 | 0.7016 | -0.0214 | -0.0182 | 0.3800 | -0.0033 | -75.6636 | -82.8406 | -0.7968 | -0.7396 | | 0.7125 | 1.6 | 200 | 0.6880 | -0.0323 | -0.0559 | 0.4100 | 0.0236 | -75.7390 | -82.8623 | -0.7969 | -0.7398 | | 0.6784 | 2.0 | 250 | 0.7048 | -0.0506 | -0.0419 | 0.3800 | -0.0087 | -75.7110 | -82.8989 | -0.7967 | -0.7399 | | 0.7093 | 2.4 | 300 | 0.6873 | -0.0310 | -0.0578 | 0.4400 | 0.0268 | -75.7429 | -82.8598 | -0.7973 | -0.7402 | | 0.6769 | 2.8 | 350 | 0.6770 | -0.0179 | -0.0654 | 0.4200 | 0.0475 | -75.7580 | -82.8335 | -0.7972 | -0.7402 | | 0.6876 | 3.2 | 400 | 0.6995 | -0.0297 | -0.0340 | 0.3500 | 0.0044 | -75.6953 | -82.8571 | -0.7966 | -0.7395 | | 0.6809 | 3.6 | 450 | 0.6703 | -0.0395 | -0.1022 | 0.4600 | 0.0627 | -75.8316 | -82.8767 | -0.7972 | -0.7402 | | 0.6812 | 4.0 | 500 | 0.6853 | -0.0127 | -0.0416 | 0.3900 | 0.0289 | -75.7105 | -82.8232 | -0.7972 | -0.7404 | | 0.7342 | 4.4 | 550 | 0.6907 | -0.0234 | -0.0410 | 0.4150 | 0.0176 | -75.7092 | -82.8446 | -0.7966 | -0.7396 | | 0.6772 | 4.8 | 600 | 0.6824 | -0.0324 | -0.0676 | 0.4450 | 0.0352 | -75.7624 | -82.8625 | -0.7968 | -0.7399 | | 0.6918 | 5.2 | 650 | 0.6813 | -0.0468 | -0.0861 | 0.3950 | 0.0393 | -75.7994 | -82.8913 | -0.7973 | -0.7402 | | 0.6778 | 5.6 | 700 | 0.6899 | -0.0390 | -0.0590 | 0.4250 | 0.0200 | -75.7452 | -82.8757 | -0.7970 | -0.7398 | | 0.6814 | 6.0 | 750 | 0.6861 | -0.0310 | -0.0623 | 0.4000 | 0.0313 | -75.7518 | -82.8598 | -0.7969 | -0.7399 | | 0.7158 | 6.4 | 800 | 0.6828 | -0.0206 | -0.0575 | 0.4250 | 0.0370 | -75.7423 | -82.8389 | -0.7970 | -0.7400 | | 0.6827 | 6.8 | 850 | 0.6909 | -0.0294 | -0.0489 | 0.4200 | 0.0195 | -75.7250 | -82.8565 | -0.7970 | -0.7401 | | 0.7306 | 7.2 | 900 | 0.6901 | -0.0305 | -0.0517 | 0.4200 | 0.0213 | -75.7307 | -82.8587 | -0.7970 | -0.7401 | | 0.6964 | 7.6 | 950 | 0.6901 | -0.0305 | -0.0517 | 0.4200 | 0.0213 | -75.7307 | -82.8587 | -0.7970 | -0.7401 | | 0.687 | 8.0 | 1000 | 0.6901 | -0.0305 | -0.0517 | 0.4200 | 0.0213 | -75.7307 | -82.8587 | -0.7970 | -0.7401 | ### Framework versions - Transformers 4.44.2 - Pytorch 2.0.0+cu117 - Datasets 3.0.0 - Tokenizers 0.19.1