Edit model card

gemma2_on_korean_conv

This model is a fine-tuned version of beomi/gemma-ko-2b on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2691

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 5
  • total_train_batch_size: 10
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 3600

Training results

Training Loss Epoch Step Validation Loss
1.236 0.1281 100 1.3164
1.1365 0.2563 200 1.2335
1.125 0.3844 300 1.1832
1.1023 0.5126 400 1.1517
1.1244 0.6407 500 1.1254
1.0095 0.7688 600 1.1015
1.1354 0.8970 700 1.0979
0.898 1.0251 800 1.0986
0.9075 1.1533 900 1.0897
0.864 1.2814 1000 1.0924
0.9093 1.4095 1100 1.0810
0.8207 1.5377 1200 1.0859
0.8376 1.6658 1300 1.0712
0.8546 1.7940 1400 1.0705
0.8231 1.9221 1500 1.0659
0.6411 2.0502 1600 1.1030
0.6646 2.1784 1700 1.1065
0.6662 2.3065 1800 1.1038
0.6596 2.4346 1900 1.1033
0.6761 2.5628 2000 1.1137
0.7028 2.6909 2100 1.1071
0.6339 2.8191 2200 1.1076
0.6714 2.9472 2300 1.1157
0.5146 3.0753 2400 1.1607
0.4817 3.2035 2500 1.1779
0.5094 3.3316 2600 1.1794
0.4954 3.4598 2700 1.1887
0.4886 3.5879 2800 1.1888
0.5176 3.7160 2900 1.1819
0.5076 3.8442 3000 1.1900
0.5286 3.9723 3100 1.1838
0.3827 4.1005 3200 1.2526
0.3933 4.2286 3300 1.2663
0.3703 4.3567 3400 1.2671
0.3967 4.4849 3500 1.2661
0.3978 4.6130 3600 1.2691

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for ghost613/gemma2_on_korean_conv

Base model

beomi/gemma-ko-2b
Adapter
this model