Mel-Iza0 commited on
Commit
25b9dfd
1 Parent(s): c1b46e1

Model save

Browse files
Files changed (1) hide show
  1. README.md +57 -45
README.md CHANGED
@@ -1,48 +1,60 @@
1
  ---
2
- language:
3
- - pt
4
- - es
5
- - en
6
- metrics:
7
- - accuracy
8
- datasets:
9
- - Weni/zeroshot-3.0.3
10
  tags:
11
- - Zeroshot
 
 
 
12
  ---
13
- # Model
14
- The model was finetuned on mistral-7b-v1
15
-
16
-
17
- # Training Arguments
18
- ```
19
- training_arguments = {
20
- 'push_to_hub': True,
21
- 'hub_strategy': 'all_checkpoints',
22
- 'max_seq_length': 2048,
23
- 'disable_tqdm': False,
24
- 'num_train_epochs': 1,
25
- 'per_device_train_batch_size': 2,
26
- 'per_device_eval_batch_size': 2,
27
- 'gradient_accumulation_steps': 2,
28
- 'gradient_checkpointing': True,
29
- 'optim': 'adamw_torch',
30
- 'lr_scheduler_type': "cosine",
31
- 'save_strategy': "epoch",
32
- 'evaluation_strategy': "epoch",
33
- 'load_best_model_at_end': True,
34
- 'metric_for_best_model': 'eval_loss',
35
- 'greater_is_better': False,
36
- 'save_safetensors': True,
37
- 'learning_rate': 4e-4,
38
- 'save_total_limit': 5,
39
- 'fp16': True,
40
- 'max_grad_norm': 0.3,
41
- 'warmup_ratio': 0.1,
42
- 'weight_decay': 0.01,
43
- 'dataset_text_field': "prompt",
44
- 'prediction_loss_only': False,
45
- 'eval_accumulation_steps': 1,
46
- 'report_to':'tensorboard'
47
- }
48
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ base_model: mistralai/Mistral-7B-v0.1
 
 
 
 
 
 
4
  tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: ZeroShot-3.0.3-Mistral-7b-Multilanguage-3.0.3
8
+ results: []
9
  ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # ZeroShot-3.0.3-Mistral-7b-Multilanguage-3.0.3
15
+
16
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: nan
19
+
20
+ ## Model description
21
+
22
+ More information needed
23
+
24
+ ## Intended uses & limitations
25
+
26
+ More information needed
27
+
28
+ ## Training and evaluation data
29
+
30
+ More information needed
31
+
32
+ ## Training procedure
33
+
34
+ ### Training hyperparameters
35
+
36
+ The following hyperparameters were used during training:
37
+ - learning_rate: 0.04
38
+ - train_batch_size: 16
39
+ - eval_batch_size: 16
40
+ - seed: 42
41
+ - gradient_accumulation_steps: 2
42
+ - total_train_batch_size: 32
43
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
+ - lr_scheduler_type: cosine
45
+ - lr_scheduler_warmup_ratio: 0.1
46
+ - num_epochs: 1
47
+
48
+ ### Training results
49
+
50
+ | Training Loss | Epoch | Step | Validation Loss |
51
+ |:-------------:|:-----:|:----:|:---------------:|
52
+ | 248.6968 | 1.0 | 915 | nan |
53
+
54
+
55
+ ### Framework versions
56
+
57
+ - Transformers 4.34.0
58
+ - Pytorch 2.0.1+cu117
59
+ - Datasets 2.13.0
60
+ - Tokenizers 0.14.1