Edit model card

collapse_gemma-2-27b_hs2_accumulate_iter2_sftsd2

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9228
  • Num Input Tokens Seen: 9275784

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
1.6524 0.0267 5 1.0225 246112
1.8688 0.0534 10 0.9766 491112
1.4621 0.0802 15 0.9656 736432
1.3566 0.1069 20 0.9668 984696
1.3829 0.1336 25 0.9667 1227844
1.1059 0.1603 30 0.9655 1474208
1.0213 0.1870 35 0.9604 1730332
0.8427 0.2138 40 0.9554 1973604
0.9259 0.2405 45 0.9537 2222012
1.0387 0.2672 50 0.9480 2472664
0.9934 0.2939 55 0.9463 2723764
0.8751 0.3206 60 0.9401 2973420
0.8539 0.3474 65 0.9390 3224868
0.7838 0.3741 70 0.9367 3468436
0.7819 0.4008 75 0.9336 3722388
0.7431 0.4275 80 0.9317 3975588
0.7116 0.4542 85 0.9302 4223548
0.7088 0.4810 90 0.9305 4476068
0.6615 0.5077 95 0.9289 4721956
0.7609 0.5344 100 0.9296 4969064
0.7459 0.5611 105 0.9293 5219084
0.7784 0.5878 110 0.9288 5462520
0.7836 0.6146 115 0.9275 5710096
0.7615 0.6413 120 0.9291 5960688
0.7463 0.6680 125 0.9255 6210428
0.7071 0.6947 130 0.9282 6458116
0.7189 0.7214 135 0.9242 6700308
0.6639 0.7482 140 0.9256 6951476
0.6825 0.7749 145 0.9237 7202416
0.7322 0.8016 150 0.9253 7452600
0.7126 0.8283 155 0.9246 7695916
0.6821 0.8550 160 0.9228 7939444
0.6741 0.8818 165 0.9242 8188360
0.7033 0.9085 170 0.9237 8432324
0.6131 0.9352 175 0.9221 8679992
0.7369 0.9619 180 0.9210 8924020
0.732 0.9886 185 0.9247 9175312

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
27.2B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter2_sftsd2

Base model

google/gemma-2-27b
Finetuned
(25)
this model