Edit model card

zephyr-7b-sft-lora-accum8-lr5e_5

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5205

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Validation Loss
2.0045 0.51 6 1.8321
1.8139 1.53 13 1.6624
1.6058 2.55 20 1.5301
1.503 3.57 27 1.3898
1.411 4.51 33 1.2622
1.2198 5.53 40 1.1715
1.1645 6.55 47 1.1198
1.1379 7.57 54 1.0838
1.0809 8.51 60 1.0619
1.0642 9.53 67 1.0391
1.0376 10.55 74 1.0167
1.0104 11.57 81 0.9956
0.9859 12.51 87 0.9778
0.9738 13.53 94 0.9560
0.925 14.55 101 0.9325
0.904 15.57 108 0.9075
0.89 16.51 114 0.8832
0.8333 17.53 121 0.8544
0.8085 18.55 128 0.8446
0.7693 19.57 135 0.8093
0.7434 20.51 141 0.7851
0.7154 21.53 148 0.7531
0.6664 22.55 155 0.7246
0.6418 23.57 162 0.7051
0.6229 24.51 168 0.6842
0.5842 25.53 175 0.6671
0.5657 26.55 182 0.6372
0.5575 27.57 189 0.6208
0.5299 28.51 195 0.6040
0.5174 29.53 202 0.5944
0.5009 30.55 209 0.5853
0.4838 31.57 216 0.5760
0.482 32.51 222 0.5700
0.4768 33.53 229 0.5616
0.4613 34.55 236 0.5534
0.4549 35.57 243 0.5499
0.4506 36.51 249 0.5455
0.4414 37.53 256 0.5414
0.4329 38.55 263 0.5389
0.4357 39.57 270 0.5361
0.4314 40.51 276 0.5340
0.4208 41.53 283 0.5333
0.4166 42.55 290 0.5273
0.4193 43.57 297 0.5271
0.4076 44.51 303 0.5257
0.4111 45.53 310 0.5244
0.4042 46.55 317 0.5245
0.4081 47.57 324 0.5226
0.3983 48.51 330 0.5207
0.3979 49.53 337 0.5202

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.0
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for shkang/zephyr-7b-sft-lora-accum8-lr5e_5

Finetuned
(695)
this model