Model save

Browse files

Files changed (9) hide show

README.md +43 -30
all_results.json +5 -5
config.json +1 -1
model-00001-of-00003.safetensors +1 -1
model-00002-of-00003.safetensors +1 -1
model-00003-of-00003.safetensors +1 -1
train_results.json +5 -5
trainer_state.json +0 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -2,16 +2,10 @@
 license: apache-2.0
 base_model: mistralai/Mistral-7B-v0.1
 tags:
-- alignment-handbook
-- trl
-- orpo
-- generated_from_trainer
 - trl
 - orpo
 - alignment-handbook
 - generated_from_trainer
-datasets:
-- HuggingFaceH4/ultrafeedback_binarized
 model-index:
 - name: zephyr-7b-sft-full-orpo
   results: []
@@ -20,23 +14,23 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/statking/huggingface/runs/ehjj41t1)
 # zephyr-7b-sft-full-orpo
-This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.5072
-- Rewards/chosen: -0.0401
-- Rewards/rejected: -0.0510
-- Rewards/accuracies: 0.6230
-- Rewards/margins: 0.0109
-- Logps/rejected: -1.0200
-- Logps/chosen: -0.8015
-- Logits/rejected: -2.4533
-- Logits/chosen: -2.4851
-- Nll Loss: 0.4727
-- Log Odds Ratio: -0.6343
-- Log Odds Chosen: 0.3605
 ## Model description
@@ -67,21 +61,40 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: inverse_sqrt
 - lr_scheduler_warmup_steps: 100
-- num_epochs: 1
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
-| 0.5707        | 0.1049 | 100  | 1.1268          | -0.0452        | -0.0539          | 0.6369             | 0.0086          | -1.0774        | -0.9045      | -2.5432         | -2.5811       | 1.0893   | -0.6413        | 0.2601          |
-| 0.5663        | 0.2098 | 200  | 0.5741          | -0.0440        | -0.0534          | 0.6270             | 0.0094          | -1.0676        | -0.8799      | -2.5377         | -2.5597       | 0.5352   | -0.6447        | 0.2863          |
-| 0.5817        | 0.3146 | 300  | 0.5572          | -0.0440        | -0.0531          | 0.6190             | 0.0091          | -1.0628        | -0.8808      | -2.4499         | -2.4818       | 0.5207   | -0.6503        | 0.2780          |
-| 0.5724        | 0.4195 | 400  | 0.5416          | -0.0426        | -0.0515          | 0.625              | 0.0089          | -1.0293        | -0.8510      | -2.4026         | -2.4376       | 0.5060   | -0.6551        | 0.2819          |
-| 0.5486        | 0.5244 | 500  | 0.5344          | -0.0425        | -0.0526          | 0.6151             | 0.0101          | -1.0514        | -0.8492      | -2.4373         | -2.4718       | 0.4990   | -0.6439        | 0.3193          |
-| 0.5156        | 0.6293 | 600  | 0.5242          | -0.0417        | -0.0514          | 0.6151             | 0.0098          | -1.0285        | -0.8333      | -2.5551         | -2.5811       | 0.4882   | -0.6470        | 0.3056          |
-| 0.5297        | 0.7341 | 700  | 0.5191          | -0.0411        | -0.0521          | 0.6310             | 0.0110          | -1.0422        | -0.8215      | -2.4477         | -2.4801       | 0.4838   | -0.6351        | 0.3407          |
-| 0.5184        | 0.8390 | 800  | 0.5138          | -0.0409        | -0.0532          | 0.6310             | 0.0123          | -1.0647        | -0.8179      | -2.4575         | -2.4922       | 0.4796   | -0.6304        | 0.3783          |
-| 0.5235        | 0.9439 | 900  | 0.5088          | -0.0404        | -0.0510          | 0.6290             | 0.0106          | -1.0202        | -0.8085      | -2.5337         | -2.5634       | 0.4741   | -0.6379        | 0.3305          |
 ### Framework versions

 license: apache-2.0
 base_model: mistralai/Mistral-7B-v0.1
 tags:
 - trl
 - orpo
 - alignment-handbook
 - generated_from_trainer
 model-index:
 - name: zephyr-7b-sft-full-orpo
   results: []
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/statking/huggingface/runs/b45ab3qe)
 # zephyr-7b-sft-full-orpo
+This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.0214
+- Rewards/chosen: -0.0976
+- Rewards/rejected: -0.1394
+- Rewards/accuracies: 0.6270
+- Rewards/margins: 0.0418
+- Logps/rejected: -2.7882
+- Logps/chosen: -1.9523
+- Logits/rejected: -2.8221
+- Logits/chosen: -2.8059
+- Nll Loss: 0.9673
+- Log Odds Ratio: -0.8558
+- Log Odds Chosen: 0.9869
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: inverse_sqrt
 - lr_scheduler_warmup_steps: 100
+- num_epochs: 3
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
+| 0.5668        | 0.1049 | 100  | 0.5843          | -0.0456        | -0.0529          | 0.6151             | 0.0073          | -1.0580        | -0.9113      | -3.3148         | -3.3082       | 0.5516   | -0.6530        | 0.2184          |
+| 0.5676        | 0.2098 | 200  | 0.5726          | -0.0441        | -0.0532          | 0.625              | 0.0092          | -1.0644        | -0.8811      | -3.0026         | -2.9992       | 0.5359   | -0.6474        | 0.2850          |
+| 0.5819        | 0.3146 | 300  | 0.5552          | -0.0439        | -0.0531          | 0.6290             | 0.0092          | -1.0620        | -0.8770      | -3.1424         | -3.1391       | 0.5202   | -0.6464        | 0.2830          |
+| 0.5738        | 0.4195 | 400  | 0.5411          | -0.0422        | -0.0517          | 0.6290             | 0.0096          | -1.0346        | -0.8434      | -3.1026         | -3.1020       | 0.5047   | -0.6522        | 0.2961          |
+| 0.5478        | 0.5244 | 500  | 0.5319          | -0.0421        | -0.0525          | 0.6290             | 0.0105          | -1.0509        | -0.8415      | -3.0260         | -3.0286       | 0.4970   | -0.6382        | 0.3327          |
+| 0.5146        | 0.6293 | 600  | 0.5240          | -0.0408        | -0.0508          | 0.6230             | 0.0100          | -1.0165        | -0.8165      | -3.1325         | -3.1275       | 0.4883   | -0.6418        | 0.3121          |
+| 0.5298        | 0.7341 | 700  | 0.5188          | -0.0413        | -0.0541          | 0.6429             | 0.0128          | -1.0827        | -0.8267      | -3.0761         | -3.0755       | 0.4842   | -0.6219        | 0.3869          |
+| 0.5181        | 0.8390 | 800  | 0.5141          | -0.0410        | -0.0524          | 0.6329             | 0.0114          | -1.0475        | -0.8198      | -3.1382         | -3.1394       | 0.4803   | -0.6322        | 0.3506          |
+| 0.5239        | 0.9439 | 900  | 0.5086          | -0.0402        | -0.0506          | 0.6310             | 0.0104          | -1.0129        | -0.8045      | -3.1191         | -3.1171       | 0.4748   | -0.6328        | 0.3268          |
+| 0.2888        | 1.0488 | 1000 | 0.5400          | -0.0436        | -0.0556          | 0.6429             | 0.0120          | -1.1128        | -0.8724      | -3.0171         | -3.0190       | 0.5058   | -0.6318        | 0.3794          |
+| 0.29          | 1.1536 | 1100 | 0.5385          | -0.0437        | -0.0574          | 0.6468             | 0.0138          | -1.1487        | -0.8736      | -3.0027         | -3.0029       | 0.5042   | -0.6256        | 0.4247          |
+| 0.2826        | 1.2585 | 1200 | 0.5428          | -0.0443        | -0.0581          | 0.6429             | 0.0139          | -1.1626        | -0.8854      | -2.9620         | -2.9583       | 0.5084   | -0.6254        | 0.4215          |
+| 0.2796        | 1.3634 | 1300 | 0.5393          | -0.0441        | -0.0589          | 0.6468             | 0.0147          | -1.1771        | -0.8825      | -2.9256         | -2.9285       | 0.5060   | -0.6208        | 0.4508          |
+| 0.2784        | 1.4683 | 1400 | 0.5365          | -0.0444        | -0.0589          | 0.6528             | 0.0145          | -1.1784        | -0.8885      | -2.9583         | -2.9594       | 0.5037   | -0.6236        | 0.4410          |
+| 0.2873        | 1.5732 | 1500 | 0.5330          | -0.0436        | -0.0579          | 0.6448             | 0.0143          | -1.1584        | -0.8718      | -2.9664         | -2.9657       | 0.5004   | -0.6226        | 0.4364          |
+| 0.276         | 1.6780 | 1600 | 0.5367          | -0.0442        | -0.0594          | 0.6409             | 0.0152          | -1.1879        | -0.8833      | -2.9358         | -2.9324       | 0.5041   | -0.6160        | 0.4570          |
+| 0.2715        | 1.7829 | 1700 | 0.5349          | -0.0436        | -0.0580          | 0.6448             | 0.0145          | -1.1603        | -0.8710      | -3.0209         | -3.0194       | 0.5024   | -0.6272        | 0.4425          |
+| 0.2717        | 1.8878 | 1800 | 0.5341          | -0.0450        | -0.0616          | 0.6548             | 0.0166          | -1.2325        | -0.8997      | -2.9579         | -2.9563       | 0.5023   | -0.6184        | 0.4824          |
+| 0.2857        | 1.9927 | 1900 | 0.5408          | -0.0454        | -0.0620          | 0.6548             | 0.0166          | -1.2409        | -0.9088      | -3.0279         | -3.0350       | 0.5091   | -0.6193        | 0.4892          |
+| 0.1137        | 2.0975 | 2000 | 0.6877          | -0.0620        | -0.0838          | 0.6706             | 0.0218          | -1.6761        | -1.2408      | -2.8815         | -2.8704       | 0.6539   | -0.6273        | 0.5767          |
+| 0.1192        | 2.2024 | 2100 | 0.7577          | -0.0706        | -0.0981          | 0.6726             | 0.0275          | -1.9620        | -1.4122      | -2.8433         | -2.8372       | 0.7199   | -0.6210        | 0.6958          |
+| 0.1178        | 2.3073 | 2200 | 1.1762          | -0.1205        | -0.1717          | 0.6528             | 0.0512          | -3.4342        | -2.4108      | -2.9107         | -2.8878       | 1.1197   | -0.7778        | 1.1628          |
+| 0.1184        | 2.4122 | 2300 | 1.8520          | -0.1935        | -0.2541          | 0.6369             | 0.0606          | -5.0812        | -3.8696      | -2.9226         | -2.9102       | 1.7542   | -1.0562        | 1.3233          |
+| 0.1172        | 2.5170 | 2400 | 1.0193          | -0.1001        | -0.1434          | 0.6409             | 0.0432          | -2.8671        | -2.0024      | -2.8710         | -2.8561       | 0.9736   | -0.8145        | 1.0075          |
+| 0.1109        | 2.6219 | 2500 | 1.2050          | -0.1209        | -0.1677          | 0.6329             | 0.0468          | -3.3547        | -2.4183      | -2.8571         | -2.8457       | 1.1724   | -0.9768        | 1.0766          |
+| 0.1238        | 2.7268 | 2600 | 2.6922          | -0.3036        | -0.3822          | 0.5873             | 0.0786          | -7.6444        | -6.0725      | -2.9967         | -2.9805       | 2.6498   | -1.6934        | 1.6674          |
+| 0.1192        | 2.8317 | 2700 | 1.2391          | -0.1189        | -0.1634          | 0.625              | 0.0445          | -3.2671        | -2.3779      | -2.8836         | -2.8662       | 1.1910   | -0.9507        | 1.0201          |
+| 0.1191        | 2.9365 | 2800 | 1.0214          | -0.0976        | -0.1394          | 0.6270             | 0.0418          | -2.7882        | -1.9523      | -2.8221         | -2.8059       | 0.9673   | -0.8558        | 0.9869          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-    "epoch": 0.9994756161510225,
     "eval_log_odds_chosen": 0.36045119166374207,
     "eval_log_odds_ratio": -0.6342776417732239,
     "eval_logits/chosen": -2.4851083755493164,
@@ -17,9 +17,9 @@
     "eval_samples_per_second": 14.569,
     "eval_steps_per_second": 0.46,
     "total_flos": 0.0,
-    "train_loss": 0.5642813587989287,
-    "train_runtime": 20357.789,
     "train_samples": 61005,
-    "train_samples_per_second": 2.997,
-    "train_steps_per_second": 0.047
 }

 {
+    "epoch": 2.9984268484530676,
     "eval_log_odds_chosen": 0.36045119166374207,
     "eval_log_odds_ratio": -0.6342776417732239,
     "eval_logits/chosen": -2.4851083755493164,
     "eval_samples_per_second": 14.569,
     "eval_steps_per_second": 0.46,
     "total_flos": 0.0,
+    "train_loss": 0.32389816019492534,
+    "train_runtime": 62235.4926,
     "train_samples": 61005,
+    "train_samples_per_second": 2.941,
+    "train_steps_per_second": 0.046
 }

config.json CHANGED Viewed

@@ -21,6 +21,6 @@
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.41.0.dev0",
-  "use_cache": true,
   "vocab_size": 32000
 }

   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.41.0.dev0",
+  "use_cache": false,
   "vocab_size": 32000
 }

model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ffa90a9394bde99c724e16cce3299e05551da2f9ecf20baf80148357c1179174
 size 4943162336

 version https://git-lfs.github.com/spec/v1
+oid sha256:ee10f5eceafcd14eb38770919bb08de1ed7713961ad136a927e2ab2dc2e5054d
 size 4943162336

model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2c33b2f762f44768cfc2ace8d53425584001dbfa592278381393c8b7d1b7d12c
 size 4999819336

 version https://git-lfs.github.com/spec/v1
+oid sha256:89834df397be326e4f7d0760093ebf7a8800ec22ebd09ad4276d44d01b6d2eb5
 size 4999819336

model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1e5dbbc97280ec6cbd9b6796a8e3a7b271c46bd33b35bcfececd6a5d5d26303f
 size 4540516344

 version https://git-lfs.github.com/spec/v1
+oid sha256:2ea0193cb9fe9772cff4067a7665f851a73e9595d6b4fdbfc36a57105015522b
 size 4540516344

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
-    "epoch": 0.9994756161510225,
     "total_flos": 0.0,
-    "train_loss": 0.5642813587989287,
-    "train_runtime": 20357.789,
     "train_samples": 61005,
-    "train_samples_per_second": 2.997,
-    "train_steps_per_second": 0.047
 }

 {
+    "epoch": 2.9984268484530676,
     "total_flos": 0.0,
+    "train_loss": 0.32389816019492534,
+    "train_runtime": 62235.4926,
     "train_samples": 61005,
+    "train_samples_per_second": 2.941,
+    "train_steps_per_second": 0.046
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bcb6aaae370ec05ab890f4838b2b43e949b0e3b36b5e24c16aff75cb84e3a43c
 size 6648

 version https://git-lfs.github.com/spec/v1
+oid sha256:3e94fd3a4a9d2e3763c25d9900c434cf062f5e77c71cb792f16d64f681e35200
 size 6648