Edit model card

gemma-2b-lora-distilabel-intel-orca-dpo-pairs

This model is a fine-tuned version of google/gemma-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4752
  • Rewards/chosen: -0.2074
  • Rewards/rejected: -2.7558
  • Rewards/accuracies: 0.8491
  • Rewards/margins: 2.5483
  • Logps/rejected: -309.6141
  • Logps/chosen: -258.4032
  • Logits/rejected: -29.9596
  • Logits/chosen: -27.7808

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 250
  • num_epochs: 5
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6326 0.19 250 0.5095 -0.4198 -1.0814 0.8348 0.6616 -292.8703 -260.5270 -29.7690 -27.6954
0.4753 0.39 500 0.4478 -0.4809 -1.9324 0.8507 1.4515 -301.3807 -261.1383 -29.6874 -27.5844
0.4466 0.58 750 0.4318 -0.1824 -1.8487 0.8503 1.6663 -300.5433 -258.1532 -29.6629 -27.5793
0.4287 0.78 1000 0.4400 -0.1281 -2.0702 0.8507 1.9420 -302.7580 -257.6101 -30.0317 -27.8922
0.4417 0.97 1250 0.4321 0.1125 -1.7668 0.8495 1.8792 -299.7242 -255.2044 -30.0155 -27.875
0.4085 1.17 1500 0.4355 -0.1108 -2.1492 0.8511 2.0384 -303.5482 -257.4367 -29.9166 -27.7871
0.3946 1.36 1750 0.4488 -0.1271 -2.3911 0.8519 2.2640 -305.9676 -257.6003 -29.8426 -27.7085
0.3982 1.56 2000 0.4362 -0.0692 -2.2448 0.8515 2.1756 -304.5043 -257.0213 -30.1425 -27.9918
0.3943 1.75 2250 0.4453 0.0607 -2.1390 0.8491 2.1997 -303.4470 -255.7220 -30.2768 -28.1039
0.3741 1.94 2500 0.4273 0.0867 -2.0180 0.8507 2.1047 -302.2360 -255.4620 -30.1318 -27.9690
0.3321 2.14 2750 0.4565 -0.0808 -2.5300 0.8499 2.4492 -307.3560 -257.1368 -30.0401 -27.8877
0.3323 2.33 3000 0.4463 0.0323 -2.2984 0.8503 2.3307 -305.0405 -256.0064 -30.1648 -27.9869
0.3495 2.53 3250 0.4299 0.1988 -1.8994 0.8511 2.0982 -301.0504 -254.3410 -30.2768 -28.0945
0.3423 2.72 3500 0.4385 0.0237 -2.1481 0.8499 2.1718 -303.5371 -256.0920 -30.1685 -27.9889
0.334 2.92 3750 0.4356 0.0467 -2.1581 0.8499 2.2047 -303.6373 -255.8624 -30.1857 -27.9928
0.2933 3.11 4000 0.4540 0.0275 -2.4119 0.8503 2.4394 -306.1758 -256.0542 -30.1524 -27.9559
0.3138 3.3 4250 0.4487 -0.0797 -2.4315 0.8499 2.3517 -306.3710 -257.1263 -30.0450 -27.8772
0.28 3.5 4500 0.4696 -0.2282 -2.8278 0.8519 2.5996 -310.3340 -258.6105 -30.0594 -27.8809
0.2796 3.69 4750 0.4545 -0.0877 -2.5133 0.8499 2.4256 -307.1899 -257.2065 -30.0334 -27.8598
0.2859 3.89 5000 0.4540 -0.1038 -2.5361 0.8507 2.4323 -307.4171 -257.3667 -29.9932 -27.8206
0.2785 4.08 5250 0.4619 -0.1923 -2.7125 0.8488 2.5202 -309.1819 -258.2524 -29.9455 -27.7723
0.2751 4.28 5500 0.4614 -0.1893 -2.7226 0.8488 2.5333 -309.2824 -258.2219 -29.9548 -27.7857
0.2522 4.47 5750 0.4606 -0.1197 -2.5970 0.8507 2.4773 -308.0268 -257.5265 -30.0076 -27.8263
0.2497 4.67 6000 0.4674 -0.1855 -2.7709 0.8503 2.5854 -309.7651 -258.1835 -29.9580 -27.7820
0.2634 4.86 6250 0.4752 -0.2074 -2.7558 0.8491 2.5483 -309.6141 -258.4032 -29.9596 -27.7808

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.0
  • Pytorch 2.1.0+cu121
  • Datasets 2.17.0
  • Tokenizers 0.15.2
Downloads last month
4
Safetensors
Model size
2.51B params
Tensor type
FP16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for glenn2/gemma-2b-lora-distilabel-intel-orca-dpo-pairs

Base model

google/gemma-2b
Adapter
(23424)
this model