ZeroShot-3.3.3-Mistral-7b-Multilanguage-3.2.0
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.3754
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 8
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 16
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- training_steps: 1612
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.8728 | 0.01 | 20 | 1.7906 |
1.4796 | 0.02 | 40 | 1.1876 |
0.8318 | 0.04 | 60 | 0.6326 |
0.5478 | 0.05 | 80 | 0.5419 |
0.517 | 0.06 | 100 | 0.5157 |
0.5109 | 0.07 | 120 | 0.4906 |
0.4656 | 0.09 | 140 | 0.4658 |
0.4409 | 0.1 | 160 | 0.4519 |
0.4316 | 0.11 | 180 | 0.4475 |
0.4297 | 0.12 | 200 | 0.4428 |
0.4226 | 0.14 | 220 | 0.4389 |
0.4321 | 0.15 | 240 | 0.4360 |
0.4261 | 0.16 | 260 | 0.4337 |
0.4235 | 0.17 | 280 | 0.4307 |
0.4279 | 0.19 | 300 | 0.4280 |
0.419 | 0.2 | 320 | 0.4253 |
0.4129 | 0.21 | 340 | 0.4230 |
0.4097 | 0.22 | 360 | 0.4223 |
0.4204 | 0.24 | 380 | 0.4200 |
0.4042 | 0.25 | 400 | 0.4191 |
0.4134 | 0.26 | 420 | 0.4176 |
0.4006 | 0.27 | 440 | 0.4158 |
0.4004 | 0.29 | 460 | 0.4141 |
0.3967 | 0.3 | 480 | 0.4123 |
0.4089 | 0.31 | 500 | 0.4100 |
0.3924 | 0.32 | 520 | 0.4087 |
0.4118 | 0.33 | 540 | 0.4079 |
0.4027 | 0.35 | 560 | 0.4069 |
0.393 | 0.36 | 580 | 0.4055 |
0.4103 | 0.37 | 600 | 0.4047 |
0.3896 | 0.38 | 620 | 0.4033 |
0.3912 | 0.4 | 640 | 0.4016 |
0.3897 | 0.41 | 660 | 0.4012 |
0.3963 | 0.42 | 680 | 0.3994 |
0.3914 | 0.43 | 700 | 0.3981 |
0.3769 | 0.45 | 720 | 0.3970 |
0.3904 | 0.46 | 740 | 0.3970 |
0.3831 | 0.47 | 760 | 0.3951 |
0.3922 | 0.48 | 780 | 0.3943 |
0.403 | 0.5 | 800 | 0.3928 |
0.3913 | 0.51 | 820 | 0.3922 |
0.3836 | 0.52 | 840 | 0.3913 |
0.3736 | 0.53 | 860 | 0.3903 |
0.3773 | 0.55 | 880 | 0.3897 |
0.3883 | 0.56 | 900 | 0.3890 |
0.3751 | 0.57 | 920 | 0.3884 |
0.3832 | 0.58 | 940 | 0.3874 |
0.3726 | 0.6 | 960 | 0.3869 |
0.3738 | 0.61 | 980 | 0.3861 |
0.3809 | 0.62 | 1000 | 0.3855 |
0.3871 | 0.63 | 1020 | 0.3845 |
0.3799 | 0.64 | 1040 | 0.3838 |
0.3882 | 0.66 | 1060 | 0.3831 |
0.3846 | 0.67 | 1080 | 0.3823 |
0.3696 | 0.68 | 1100 | 0.3821 |
0.3791 | 0.69 | 1120 | 0.3816 |
0.3726 | 0.71 | 1140 | 0.3808 |
0.3698 | 0.72 | 1160 | 0.3804 |
0.3777 | 0.73 | 1180 | 0.3800 |
0.3637 | 0.74 | 1200 | 0.3794 |
0.3653 | 0.76 | 1220 | 0.3787 |
0.382 | 0.77 | 1240 | 0.3783 |
0.3587 | 0.78 | 1260 | 0.3781 |
0.3729 | 0.79 | 1280 | 0.3776 |
0.3731 | 0.81 | 1300 | 0.3772 |
0.3757 | 0.82 | 1320 | 0.3770 |
0.3733 | 0.83 | 1340 | 0.3767 |
0.3792 | 0.84 | 1360 | 0.3764 |
0.3678 | 0.86 | 1380 | 0.3761 |
0.3604 | 0.87 | 1400 | 0.3759 |
0.3496 | 0.88 | 1420 | 0.3758 |
0.3676 | 0.89 | 1440 | 0.3757 |
0.3678 | 0.91 | 1460 | 0.3757 |
0.3646 | 0.92 | 1480 | 0.3755 |
0.3621 | 0.93 | 1500 | 0.3755 |
0.3825 | 0.94 | 1520 | 0.3754 |
0.3718 | 0.95 | 1540 | 0.3754 |
0.3511 | 0.97 | 1560 | 0.3754 |
0.3716 | 0.98 | 1580 | 0.3754 |
0.3766 | 0.99 | 1600 | 0.3754 |
Framework versions
- PEFT 0.8.2
- Transformers 4.39.0.dev0
- Pytorch 2.1.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.1
- Downloads last month
- 4
Model tree for Weni/ZeroShot-3.3.3-Mistral-7b-Multilanguage-3.2.0
Base model
mistralai/Mistral-7B-Instruct-v0.2