collapse_gemma-2-27b_hs2_accumulate_iter2_sftsd2
This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9228
- Num Input Tokens Seen: 9275784
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.1282 | 0 |
1.6524 | 0.0267 | 5 | 1.0225 | 246112 |
1.8688 | 0.0534 | 10 | 0.9766 | 491112 |
1.4621 | 0.0802 | 15 | 0.9656 | 736432 |
1.3566 | 0.1069 | 20 | 0.9668 | 984696 |
1.3829 | 0.1336 | 25 | 0.9667 | 1227844 |
1.1059 | 0.1603 | 30 | 0.9655 | 1474208 |
1.0213 | 0.1870 | 35 | 0.9604 | 1730332 |
0.8427 | 0.2138 | 40 | 0.9554 | 1973604 |
0.9259 | 0.2405 | 45 | 0.9537 | 2222012 |
1.0387 | 0.2672 | 50 | 0.9480 | 2472664 |
0.9934 | 0.2939 | 55 | 0.9463 | 2723764 |
0.8751 | 0.3206 | 60 | 0.9401 | 2973420 |
0.8539 | 0.3474 | 65 | 0.9390 | 3224868 |
0.7838 | 0.3741 | 70 | 0.9367 | 3468436 |
0.7819 | 0.4008 | 75 | 0.9336 | 3722388 |
0.7431 | 0.4275 | 80 | 0.9317 | 3975588 |
0.7116 | 0.4542 | 85 | 0.9302 | 4223548 |
0.7088 | 0.4810 | 90 | 0.9305 | 4476068 |
0.6615 | 0.5077 | 95 | 0.9289 | 4721956 |
0.7609 | 0.5344 | 100 | 0.9296 | 4969064 |
0.7459 | 0.5611 | 105 | 0.9293 | 5219084 |
0.7784 | 0.5878 | 110 | 0.9288 | 5462520 |
0.7836 | 0.6146 | 115 | 0.9275 | 5710096 |
0.7615 | 0.6413 | 120 | 0.9291 | 5960688 |
0.7463 | 0.6680 | 125 | 0.9255 | 6210428 |
0.7071 | 0.6947 | 130 | 0.9282 | 6458116 |
0.7189 | 0.7214 | 135 | 0.9242 | 6700308 |
0.6639 | 0.7482 | 140 | 0.9256 | 6951476 |
0.6825 | 0.7749 | 145 | 0.9237 | 7202416 |
0.7322 | 0.8016 | 150 | 0.9253 | 7452600 |
0.7126 | 0.8283 | 155 | 0.9246 | 7695916 |
0.6821 | 0.8550 | 160 | 0.9228 | 7939444 |
0.6741 | 0.8818 | 165 | 0.9242 | 8188360 |
0.7033 | 0.9085 | 170 | 0.9237 | 8432324 |
0.6131 | 0.9352 | 175 | 0.9221 | 8679992 |
0.7369 | 0.9619 | 180 | 0.9210 | 8924020 |
0.732 | 0.9886 | 185 | 0.9247 | 9175312 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 8
Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter2_sftsd2
Base model
google/gemma-2-27b