---
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: IE_L3_1000steps_1e8rate_05beta_cSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# IE_L3_1000steps_1e8rate_05beta_cSFTDPO

This model is a fine-tuned version of [tsavage68/IE_L3_1000steps_1e6rate_SFT](https://huggingface.co/tsavage68/IE_L3_1000steps_1e6rate_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6901
- Rewards/chosen: -0.0305
- Rewards/rejected: -0.0517
- Rewards/accuracies: 0.4200
- Rewards/margins: 0.0213
- Logps/rejected: -75.7307
- Logps/chosen: -82.8587
- Logits/rejected: -0.7970
- Logits/chosen: -0.7401

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-08
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6994        | 0.4   | 50   | 0.7013          | -0.0193        | -0.0168          | 0.375              | -0.0025         | -75.6609       | -82.8363     | -0.7968         | -0.7397       |
| 0.7002        | 0.8   | 100  | 0.7038          | -0.0158        | -0.0084          | 0.3450             | -0.0074         | -75.6441       | -82.8293     | -0.7971         | -0.7401       |
| 0.6907        | 1.2   | 150  | 0.7016          | -0.0214        | -0.0182          | 0.3800             | -0.0033         | -75.6636       | -82.8406     | -0.7968         | -0.7396       |
| 0.7125        | 1.6   | 200  | 0.6880          | -0.0323        | -0.0559          | 0.4100             | 0.0236          | -75.7390       | -82.8623     | -0.7969         | -0.7398       |
| 0.6784        | 2.0   | 250  | 0.7048          | -0.0506        | -0.0419          | 0.3800             | -0.0087         | -75.7110       | -82.8989     | -0.7967         | -0.7399       |
| 0.7093        | 2.4   | 300  | 0.6873          | -0.0310        | -0.0578          | 0.4400             | 0.0268          | -75.7429       | -82.8598     | -0.7973         | -0.7402       |
| 0.6769        | 2.8   | 350  | 0.6770          | -0.0179        | -0.0654          | 0.4200             | 0.0475          | -75.7580       | -82.8335     | -0.7972         | -0.7402       |
| 0.6876        | 3.2   | 400  | 0.6995          | -0.0297        | -0.0340          | 0.3500             | 0.0044          | -75.6953       | -82.8571     | -0.7966         | -0.7395       |
| 0.6809        | 3.6   | 450  | 0.6703          | -0.0395        | -0.1022          | 0.4600             | 0.0627          | -75.8316       | -82.8767     | -0.7972         | -0.7402       |
| 0.6812        | 4.0   | 500  | 0.6853          | -0.0127        | -0.0416          | 0.3900             | 0.0289          | -75.7105       | -82.8232     | -0.7972         | -0.7404       |
| 0.7342        | 4.4   | 550  | 0.6907          | -0.0234        | -0.0410          | 0.4150             | 0.0176          | -75.7092       | -82.8446     | -0.7966         | -0.7396       |
| 0.6772        | 4.8   | 600  | 0.6824          | -0.0324        | -0.0676          | 0.4450             | 0.0352          | -75.7624       | -82.8625     | -0.7968         | -0.7399       |
| 0.6918        | 5.2   | 650  | 0.6813          | -0.0468        | -0.0861          | 0.3950             | 0.0393          | -75.7994       | -82.8913     | -0.7973         | -0.7402       |
| 0.6778        | 5.6   | 700  | 0.6899          | -0.0390        | -0.0590          | 0.4250             | 0.0200          | -75.7452       | -82.8757     | -0.7970         | -0.7398       |
| 0.6814        | 6.0   | 750  | 0.6861          | -0.0310        | -0.0623          | 0.4000             | 0.0313          | -75.7518       | -82.8598     | -0.7969         | -0.7399       |
| 0.7158        | 6.4   | 800  | 0.6828          | -0.0206        | -0.0575          | 0.4250             | 0.0370          | -75.7423       | -82.8389     | -0.7970         | -0.7400       |
| 0.6827        | 6.8   | 850  | 0.6909          | -0.0294        | -0.0489          | 0.4200             | 0.0195          | -75.7250       | -82.8565     | -0.7970         | -0.7401       |
| 0.7306        | 7.2   | 900  | 0.6901          | -0.0305        | -0.0517          | 0.4200             | 0.0213          | -75.7307       | -82.8587     | -0.7970         | -0.7401       |
| 0.6964        | 7.6   | 950  | 0.6901          | -0.0305        | -0.0517          | 0.4200             | 0.0213          | -75.7307       | -82.8587     | -0.7970         | -0.7401       |
| 0.687         | 8.0   | 1000 | 0.6901          | -0.0305        | -0.0517          | 0.4200             | 0.0213          | -75.7307       | -82.8587     | -0.7970         | -0.7401       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.0.0+cu117
- Datasets 3.0.0
- Tokenizers 0.19.1