|
--- |
|
license: apache-2.0 |
|
library_name: peft |
|
tags: |
|
- trl |
|
- sft |
|
- generated_from_trainer |
|
datasets: |
|
- generator |
|
base_model: mistralai/Mistral-7B-Instruct-v0.2 |
|
model-index: |
|
- name: mistralai/Mistral-7B-Instruct-v0.2 |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# mistralai/Mistral-7B-Instruct-v0.2 |
|
|
|
This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the generator dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.5526 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 2.5e-05 |
|
- train_batch_size: 32 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 0.03 |
|
- training_steps: 600 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:-----:|:----:|:---------------:| |
|
| 2.7925 | 0.22 | 10 | 2.0998 | |
|
| 1.6897 | 0.43 | 20 | 1.3864 | |
|
| 1.3495 | 0.65 | 30 | 1.2622 | |
|
| 1.2144 | 0.87 | 40 | 1.1882 | |
|
| 1.1546 | 1.09 | 50 | 1.1397 | |
|
| 1.1002 | 1.3 | 60 | 1.0843 | |
|
| 1.0023 | 1.52 | 70 | 0.9794 | |
|
| 0.897 | 1.74 | 80 | 0.9370 | |
|
| 0.8625 | 1.96 | 90 | 0.8557 | |
|
| 0.7492 | 2.17 | 100 | 0.7909 | |
|
| 0.7296 | 2.39 | 110 | 0.7455 | |
|
| 0.6738 | 2.61 | 120 | 0.7239 | |
|
| 0.656 | 2.83 | 130 | 0.7071 | |
|
| 0.6289 | 3.04 | 140 | 0.6852 | |
|
| 0.5835 | 3.26 | 150 | 0.6704 | |
|
| 0.5647 | 3.48 | 160 | 0.6481 | |
|
| 0.5416 | 3.7 | 170 | 0.6326 | |
|
| 0.5159 | 3.91 | 180 | 0.6219 | |
|
| 0.475 | 4.13 | 190 | 0.6091 | |
|
| 0.4529 | 4.35 | 200 | 0.5903 | |
|
| 0.4358 | 4.57 | 210 | 0.5769 | |
|
| 0.4124 | 4.78 | 220 | 0.5574 | |
|
| 0.3925 | 5.0 | 230 | 0.5433 | |
|
| 0.3325 | 5.22 | 240 | 0.5396 | |
|
| 0.3307 | 5.43 | 250 | 0.5241 | |
|
| 0.3122 | 5.65 | 260 | 0.5185 | |
|
| 0.2973 | 5.87 | 270 | 0.5042 | |
|
| 0.2695 | 6.09 | 280 | 0.5082 | |
|
| 0.2345 | 6.3 | 290 | 0.5020 | |
|
| 0.2307 | 6.52 | 300 | 0.4859 | |
|
| 0.2226 | 6.74 | 310 | 0.4771 | |
|
| 0.2083 | 6.96 | 320 | 0.4717 | |
|
| 0.1858 | 7.17 | 330 | 0.4881 | |
|
| 0.1677 | 7.39 | 340 | 0.4791 | |
|
| 0.1663 | 7.61 | 350 | 0.4774 | |
|
| 0.1609 | 7.83 | 360 | 0.4780 | |
|
| 0.1493 | 8.04 | 370 | 0.4820 | |
|
| 0.1332 | 8.26 | 380 | 0.4940 | |
|
| 0.1351 | 8.48 | 390 | 0.4898 | |
|
| 0.1251 | 8.7 | 400 | 0.4894 | |
|
| 0.1243 | 8.91 | 410 | 0.4836 | |
|
| 0.1121 | 9.13 | 420 | 0.5108 | |
|
| 0.1059 | 9.35 | 430 | 0.5055 | |
|
| 0.1037 | 9.57 | 440 | 0.4974 | |
|
| 0.102 | 9.78 | 450 | 0.4981 | |
|
| 0.1032 | 10.0 | 460 | 0.5100 | |
|
| 0.0887 | 10.22 | 470 | 0.5267 | |
|
| 0.09 | 10.43 | 480 | 0.5231 | |
|
| 0.084 | 10.65 | 490 | 0.5228 | |
|
| 0.0865 | 10.87 | 500 | 0.5166 | |
|
| 0.0838 | 11.09 | 510 | 0.5337 | |
|
| 0.0762 | 11.3 | 520 | 0.5444 | |
|
| 0.0792 | 11.52 | 530 | 0.5375 | |
|
| 0.0765 | 11.74 | 540 | 0.5397 | |
|
| 0.0747 | 11.96 | 550 | 0.5386 | |
|
| 0.0684 | 12.17 | 560 | 0.5517 | |
|
| 0.0697 | 12.39 | 570 | 0.5547 | |
|
| 0.0701 | 12.61 | 580 | 0.5528 | |
|
| 0.0702 | 12.83 | 590 | 0.5522 | |
|
| 0.0693 | 13.04 | 600 | 0.5526 | |
|
|
|
|
|
### Framework versions |
|
|
|
- PEFT 0.7.1 |
|
- Transformers 4.36.2 |
|
- Pytorch 2.1.2+cu121 |
|
- Datasets 2.16.1 |
|
- Tokenizers 0.15.0 |