Plainly Optimized Network
Dataset: BIGBENCH
Trainer Hyperparameters:
lr
= 5e-05per_device_batch_size
= 1gradient_accumulation_steps
= 4weight_decay
= 1e-09seed
= 42
eval_loss | eval_mse | epoch |
---|---|---|
58.741 | 0.055 | 1.0 |
60.624 | 0.058 | 2.0 |
60.765 | 0.057 | 3.0 |
55.858 | 0.051 | 4.0 |
57.271 | 0.053 | 5.0 |
56.004 | 0.051 | 6.0 |
60.246 | 0.056 | 7.0 |
55.218 | 0.049 | 8.0 |
55.261 | 0.049 | 9.0 |
54.730 | 0.049 | 10.0 |
58.137 | 0.052 | 11.0 |
53.927 | 0.048 | 12.0 |
56.143 | 0.051 | 13.0 |
54.604 | 0.049 | 14.0 |
53.596 | 0.048 | 15.0 |
54.241 | 0.049 | 16.0 |
55.500 | 0.050 | 17.0 |
53.256 | 0.047 | 18.0 |
53.139 | 0.047 | 19.0 |
- Downloads last month
- 4