Model save

Browse files

Files changed (9) hide show

README.md +24 -49
all_results.json +4 -4
config.json +1 -1
model-00001-of-00003.safetensors +1 -1
model-00002-of-00003.safetensors +1 -1
model-00003-of-00003.safetensors +1 -1
train_results.json +4 -4
trainer_state.json +0 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -2,16 +2,10 @@
 license: apache-2.0
 base_model: mistralai/Mistral-7B-v0.1
 tags:
-- alignment-handbook
-- trl
-- orpo
-- generated_from_trainer
 - trl
 - orpo
 - alignment-handbook
 - generated_from_trainer
-datasets:
-- HuggingFaceH4/ultrafeedback_binarized
 model-index:
 - name: zephyr-7b-sft-full-orpo
   results: []
@@ -20,23 +14,23 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/statking/huggingface/runs/b45ab3qe)
 # zephyr-7b-sft-full-orpo
-This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.3771
-- Rewards/chosen: -0.1391
-- Rewards/rejected: -0.1930
 - Rewards/accuracies: 0.6528
-- Rewards/margins: 0.0539
-- Logps/rejected: -3.8602
-- Logps/chosen: -2.7813
-- Logits/rejected: -2.8670
-- Logits/chosen: -2.8498
-- Nll Loss: 1.3532
-- Log Odds Ratio: -1.0480
-- Log Odds Chosen: 1.2201
 ## Model description
@@ -55,7 +49,7 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 2e-05
 - train_batch_size: 8
 - eval_batch_size: 8
 - seed: 42
@@ -67,40 +61,21 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: inverse_sqrt
 - lr_scheduler_warmup_steps: 100
-- num_epochs: 3
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
-| 0.5668        | 0.1049 | 100  | 0.5843          | -0.0456        | -0.0529          | 0.6151             | 0.0073          | -1.0580        | -0.9113      | -3.3148         | -3.3082       | 0.5516   | -0.6530        | 0.2184          |
-| 0.5676        | 0.2098 | 200  | 0.5726          | -0.0441        | -0.0532          | 0.625              | 0.0092          | -1.0644        | -0.8811      | -3.0026         | -2.9992       | 0.5359   | -0.6474        | 0.2850          |
-| 0.5819        | 0.3146 | 300  | 0.5552          | -0.0439        | -0.0531          | 0.6290             | 0.0092          | -1.0620        | -0.8770      | -3.1424         | -3.1391       | 0.5202   | -0.6464        | 0.2830          |
-| 0.5738        | 0.4195 | 400  | 0.5411          | -0.0422        | -0.0517          | 0.6290             | 0.0096          | -1.0346        | -0.8434      | -3.1026         | -3.1020       | 0.5047   | -0.6522        | 0.2961          |
-| 0.5478        | 0.5244 | 500  | 0.5319          | -0.0421        | -0.0525          | 0.6290             | 0.0105          | -1.0509        | -0.8415      | -3.0260         | -3.0286       | 0.4970   | -0.6382        | 0.3327          |
-| 0.5146        | 0.6293 | 600  | 0.5240          | -0.0408        | -0.0508          | 0.6230             | 0.0100          | -1.0165        | -0.8165      | -3.1325         | -3.1275       | 0.4883   | -0.6418        | 0.3121          |
-| 0.5298        | 0.7341 | 700  | 0.5188          | -0.0413        | -0.0541          | 0.6429             | 0.0128          | -1.0827        | -0.8267      | -3.0761         | -3.0755       | 0.4842   | -0.6219        | 0.3869          |
-| 0.5181        | 0.8390 | 800  | 0.5141          | -0.0410        | -0.0524          | 0.6329             | 0.0114          | -1.0475        | -0.8198      | -3.1382         | -3.1394       | 0.4803   | -0.6322        | 0.3506          |
-| 0.5239        | 0.9439 | 900  | 0.5086          | -0.0402        | -0.0506          | 0.6310             | 0.0104          | -1.0129        | -0.8045      | -3.1191         | -3.1171       | 0.4748   | -0.6328        | 0.3268          |
-| 0.2888        | 1.0488 | 1000 | 0.5400          | -0.0436        | -0.0556          | 0.6429             | 0.0120          | -1.1128        | -0.8724      | -3.0171         | -3.0190       | 0.5058   | -0.6318        | 0.3794          |
-| 0.29          | 1.1536 | 1100 | 0.5385          | -0.0437        | -0.0574          | 0.6468             | 0.0138          | -1.1487        | -0.8736      | -3.0027         | -3.0029       | 0.5042   | -0.6256        | 0.4247          |
-| 0.2826        | 1.2585 | 1200 | 0.5428          | -0.0443        | -0.0581          | 0.6429             | 0.0139          | -1.1626        | -0.8854      | -2.9620         | -2.9583       | 0.5084   | -0.6254        | 0.4215          |
-| 0.2796        | 1.3634 | 1300 | 0.5393          | -0.0441        | -0.0589          | 0.6468             | 0.0147          | -1.1771        | -0.8825      | -2.9256         | -2.9285       | 0.5060   | -0.6208        | 0.4508          |
-| 0.2784        | 1.4683 | 1400 | 0.5365          | -0.0444        | -0.0589          | 0.6528             | 0.0145          | -1.1784        | -0.8885      | -2.9583         | -2.9594       | 0.5037   | -0.6236        | 0.4410          |
-| 0.2873        | 1.5732 | 1500 | 0.5330          | -0.0436        | -0.0579          | 0.6448             | 0.0143          | -1.1584        | -0.8718      | -2.9664         | -2.9657       | 0.5004   | -0.6226        | 0.4364          |
-| 0.276         | 1.6780 | 1600 | 0.5367          | -0.0442        | -0.0594          | 0.6409             | 0.0152          | -1.1879        | -0.8833      | -2.9358         | -2.9324       | 0.5041   | -0.6160        | 0.4570          |
-| 0.2715        | 1.7829 | 1700 | 0.5349          | -0.0436        | -0.0580          | 0.6448             | 0.0145          | -1.1603        | -0.8710      | -3.0209         | -3.0194       | 0.5024   | -0.6272        | 0.4425          |
-| 0.2717        | 1.8878 | 1800 | 0.5341          | -0.0450        | -0.0616          | 0.6548             | 0.0166          | -1.2325        | -0.8997      | -2.9579         | -2.9563       | 0.5023   | -0.6184        | 0.4824          |
-| 0.2857        | 1.9927 | 1900 | 0.5408          | -0.0454        | -0.0620          | 0.6548             | 0.0166          | -1.2409        | -0.9088      | -3.0279         | -3.0350       | 0.5091   | -0.6193        | 0.4892          |
-| 0.1137        | 2.0975 | 2000 | 0.6877          | -0.0620        | -0.0838          | 0.6706             | 0.0218          | -1.6761        | -1.2408      | -2.8815         | -2.8704       | 0.6539   | -0.6273        | 0.5767          |
-| 0.1192        | 2.2024 | 2100 | 0.7577          | -0.0706        | -0.0981          | 0.6726             | 0.0275          | -1.9620        | -1.4122      | -2.8433         | -2.8372       | 0.7199   | -0.6210        | 0.6958          |
-| 0.1178        | 2.3073 | 2200 | 1.1762          | -0.1205        | -0.1717          | 0.6528             | 0.0512          | -3.4342        | -2.4108      | -2.9107         | -2.8878       | 1.1197   | -0.7778        | 1.1628          |
-| 0.1184        | 2.4122 | 2300 | 1.8520          | -0.1935        | -0.2541          | 0.6369             | 0.0606          | -5.0812        | -3.8696      | -2.9226         | -2.9102       | 1.7542   | -1.0562        | 1.3233          |
-| 0.1172        | 2.5170 | 2400 | 1.0193          | -0.1001        | -0.1434          | 0.6409             | 0.0432          | -2.8671        | -2.0024      | -2.8710         | -2.8561       | 0.9736   | -0.8145        | 1.0075          |
-| 0.1109        | 2.6219 | 2500 | 1.2050          | -0.1209        | -0.1677          | 0.6329             | 0.0468          | -3.3547        | -2.4183      | -2.8571         | -2.8457       | 1.1724   | -0.9768        | 1.0766          |
-| 0.1238        | 2.7268 | 2600 | 2.6922          | -0.3036        | -0.3822          | 0.5873             | 0.0786          | -7.6444        | -6.0725      | -2.9967         | -2.9805       | 2.6498   | -1.6934        | 1.6674          |
-| 0.1192        | 2.8317 | 2700 | 1.2391          | -0.1189        | -0.1634          | 0.625              | 0.0445          | -3.2671        | -2.3779      | -2.8836         | -2.8662       | 1.1910   | -0.9507        | 1.0201          |
-| 0.1191        | 2.9365 | 2800 | 1.0214          | -0.0976        | -0.1394          | 0.6270             | 0.0418          | -2.7882        | -1.9523      | -2.8221         | -2.8059       | 0.9673   | -0.8558        | 0.9869          |
 ### Framework versions

 license: apache-2.0
 base_model: mistralai/Mistral-7B-v0.1
 tags:
 - trl
 - orpo
 - alignment-handbook
 - generated_from_trainer
 model-index:
 - name: zephyr-7b-sft-full-orpo
   results: []
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/statking/huggingface/runs/90a8kp39)
 # zephyr-7b-sft-full-orpo
+This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.4714
+- Rewards/chosen: -0.0357
+- Rewards/rejected: -0.0466
 - Rewards/accuracies: 0.6528
+- Rewards/margins: 0.0109
+- Logps/rejected: -0.9324
+- Logps/chosen: -0.7143
+- Logits/rejected: -2.9543
+- Logits/chosen: -2.9692
+- Nll Loss: 0.4361
+- Log Odds Ratio: -0.6245
+- Log Odds Chosen: 0.3669
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 7e-06
 - train_batch_size: 8
 - eval_batch_size: 8
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: inverse_sqrt
 - lr_scheduler_warmup_steps: 100
+- num_epochs: 1
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
+| 0.5226        | 0.1049 | 100  | 0.5280          | -0.0386        | -0.0472          | 0.6329             | 0.0086          | -0.9448        | -0.7728      | -2.7583         | -2.7860       | 0.4953   | -0.6326        | 0.2873          |
+| 0.5074        | 0.2098 | 200  | 0.5134          | -0.0381        | -0.0478          | 0.6409             | 0.0098          | -0.9566        | -0.7612      | -2.6736         | -2.7002       | 0.4774   | -0.6357        | 0.3190          |
+| 0.5265        | 0.3146 | 300  | 0.5012          | -0.0379        | -0.0479          | 0.6329             | 0.0099          | -0.9572        | -0.7588      | -2.7317         | -2.7594       | 0.4653   | -0.6374        | 0.3278          |
+| 0.5194        | 0.4195 | 400  | 0.4912          | -0.0371        | -0.0478          | 0.6429             | 0.0107          | -0.9559        | -0.7417      | -2.6640         | -2.6974       | 0.4560   | -0.6284        | 0.3607          |
+| 0.5008        | 0.5244 | 500  | 0.4847          | -0.0373        | -0.0489          | 0.6508             | 0.0117          | -0.9786        | -0.7455      | -2.5957         | -2.6294       | 0.4499   | -0.6209        | 0.3873          |
+| 0.4725        | 0.6293 | 600  | 0.4794          | -0.0362        | -0.0470          | 0.6349             | 0.0107          | -0.9394        | -0.7248      | -2.6147         | -2.6477       | 0.4435   | -0.6320        | 0.3567          |
+| 0.4875        | 0.7341 | 700  | 0.4767          | -0.0368        | -0.0498          | 0.6409             | 0.0129          | -0.9955        | -0.7365      | -2.6910         | -2.7213       | 0.4416   | -0.6158        | 0.4180          |
+| 0.4796        | 0.8390 | 800  | 0.4740          | -0.0371        | -0.0508          | 0.6508             | 0.0137          | -1.0162        | -0.7416      | -2.7913         | -2.8114       | 0.4396   | -0.6169        | 0.4363          |
+| 0.4851        | 0.9439 | 900  | 0.4714          | -0.0357        | -0.0466          | 0.6528             | 0.0109          | -0.9324        | -0.7143      | -2.9543         | -2.9692       | 0.4361   | -0.6245        | 0.3669          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-    "epoch": 2.9984268484530676,
     "eval_log_odds_chosen": 1.220078706741333,
     "eval_log_odds_ratio": -1.047989010810852,
     "eval_logits/chosen": -2.849764585494995,
@@ -17,9 +17,9 @@
     "eval_samples_per_second": 14.663,
     "eval_steps_per_second": 0.463,
     "total_flos": 0.0,
-    "train_loss": 0.32389816019492534,
-    "train_runtime": 62235.4926,
     "train_samples": 61005,
-    "train_samples_per_second": 2.941,
     "train_steps_per_second": 0.046
 }

 {
+    "epoch": 0.9994756161510225,
     "eval_log_odds_chosen": 1.220078706741333,
     "eval_log_odds_ratio": -1.047989010810852,
     "eval_logits/chosen": -2.849764585494995,
     "eval_samples_per_second": 14.663,
     "eval_steps_per_second": 0.463,
     "total_flos": 0.0,
+    "train_loss": 0.5301580581685054,
+    "train_runtime": 20737.8205,
     "train_samples": 61005,
+    "train_samples_per_second": 2.942,
     "train_steps_per_second": 0.046
 }

config.json CHANGED Viewed

@@ -21,6 +21,6 @@
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.41.0.dev0",
-  "use_cache": true,
   "vocab_size": 32000
 }

   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.41.0.dev0",
+  "use_cache": false,
   "vocab_size": 32000
 }

model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ee10f5eceafcd14eb38770919bb08de1ed7713961ad136a927e2ab2dc2e5054d
 size 4943162336

 version https://git-lfs.github.com/spec/v1
+oid sha256:b313dc48ecc95a24426280ae5e3e66f841af88ea853a46696d2faaae1f2f129e
 size 4943162336

model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:89834df397be326e4f7d0760093ebf7a8800ec22ebd09ad4276d44d01b6d2eb5
 size 4999819336

 version https://git-lfs.github.com/spec/v1
+oid sha256:cfee51622377d86839cc88b039e910fc8cb1731ae63fd5fb76be8ccc53beca43
 size 4999819336

model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2ea0193cb9fe9772cff4067a7665f851a73e9595d6b4fdbfc36a57105015522b
 size 4540516344

 version https://git-lfs.github.com/spec/v1
+oid sha256:1d332c123718dc9f9ea7414c9e2362576015310d8b853af1269dffded4a17cec
 size 4540516344

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
-    "epoch": 2.9984268484530676,
     "total_flos": 0.0,
-    "train_loss": 0.32389816019492534,
-    "train_runtime": 62235.4926,
     "train_samples": 61005,
-    "train_samples_per_second": 2.941,
     "train_steps_per_second": 0.046
 }

 {
+    "epoch": 0.9994756161510225,
     "total_flos": 0.0,
+    "train_loss": 0.5301580581685054,
+    "train_runtime": 20737.8205,
     "train_samples": 61005,
+    "train_samples_per_second": 2.942,
     "train_steps_per_second": 0.046
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3e94fd3a4a9d2e3763c25d9900c434cf062f5e77c71cb792f16d64f681e35200
 size 6648

 version https://git-lfs.github.com/spec/v1
+oid sha256:427c0935c426bbb8857ae50317955d9dcaa991ddbe5ea21647c088c2a9b1ccc7
 size 6648