metadata
base_model: lvwerra/gpt2-imdb
tags:
- generated_from_trainer
model-index:
- name: gpt-imdb-ipo_annealing
results: []
gpt-imdb-ipo_annealing
This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 125.6974
- Rewards/chosen: -0.0343
- Rewards/rejected: -0.1277
- Rewards/accuracies: 0.875
- Rewards/margins: 0.0934
- Logps/rejected: -267.1282
- Logps/chosen: -236.1897
- Logits/rejected: -31.3501
- Logits/chosen: -31.5916
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 24
- eval_batch_size: 24
- seed: 42
- optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 150
- training_steps: 7197
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
16.3187 | 0.21 | 500 | 34.0876 | 0.1161 | -0.1126 | 0.5292 | 0.2287 | -263.8062 | -235.1407 | -33.1877 | -33.4371 |
5.5155 | 0.42 | 1000 | 13.0423 | -0.1485 | -0.3812 | 0.5042 | 0.2327 | -264.1273 | -235.4375 | -35.2608 | -35.4541 |
10.2532 | 0.63 | 1500 | 18.5157 | -0.4407 | -0.5471 | 0.5458 | 0.1064 | -264.3746 | -235.8205 | -34.2230 | -34.4246 |
6.755 | 0.83 | 2000 | 28.1593 | -0.7791 | -0.8052 | 0.5917 | 0.0261 | -264.7961 | -236.3400 | -33.6119 | -33.8069 |
9.4126 | 1.04 | 2500 | 9.2406 | -0.8733 | -1.2564 | 0.6229 | 0.3831 | -265.6003 | -236.5962 | -31.9471 | -32.0700 |
8.5908 | 1.25 | 3000 | 12.4967 | -0.6700 | -1.0163 | 0.6167 | 0.3462 | -265.4156 | -236.4061 | -31.6914 | -31.8443 |
19.5217 | 1.46 | 3500 | 6.8889 | -0.0720 | -0.4689 | 0.6854 | 0.3969 | -264.5895 | -235.4041 | -32.1300 | -32.2692 |
6.9195 | 1.67 | 4000 | 4.2435 | -0.5324 | -0.9335 | 0.7021 | 0.4012 | -265.7609 | -236.4489 | -31.8342 | -31.9606 |
4.6993 | 1.88 | 4500 | 5.0987 | -0.2002 | -0.6179 | 0.7521 | 0.4177 | -265.3070 | -235.7907 | -31.6301 | -31.7617 |
2.7896 | 2.08 | 5000 | 2.7344 | -0.2390 | -0.5589 | 0.7500 | 0.3199 | -265.4754 | -236.0307 | -31.9650 | -32.1009 |
3.2262 | 2.29 | 5500 | 3.0584 | -0.1936 | -0.5168 | 0.8083 | 0.3231 | -265.8080 | -236.0606 | -31.6585 | -31.8243 |
4.1965 | 2.5 | 6000 | 4.2350 | -0.1555 | -0.4440 | 0.8417 | 0.2884 | -266.2272 | -236.1557 | -31.6484 | -31.8344 |
15.1482 | 2.71 | 6500 | 10.8174 | -0.0932 | -0.3244 | 0.8667 | 0.2312 | -266.7491 | -236.1454 | -31.4600 | -31.6800 |
145.9251 | 2.92 | 7000 | 125.6974 | -0.0343 | -0.1277 | 0.875 | 0.0934 | -267.1282 | -236.1897 | -31.3501 | -31.5916 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.1
- Datasets 2.15.0
- Tokenizers 0.15.0