Andrey Kutuzov commited on
Commit
fb38b14
1 Parent(s): 0067049

Camera ready

Browse files
README.md CHANGED
@@ -12,34 +12,46 @@ language:
12
  widget:
13
  - text: "Мы сели в тачку и поехали по ресторанам. Что такое тачка?"
14
  example_title: "Definition generation"
 
15
  ---
16
 
17
- # mt0-definition-ru-xl
18
 
19
- This model is a version of [mt0-xl](https://huggingface.co/bigscience/mt0-xl) finetuned on the Russian part of CoDWoE dataset.
 
20
 
21
- It achieves the following results on the evaluation set:
22
- - Loss: 1.6241
23
- - Rouge1: 0.2536
24
- - Rouge2: 0.003
25
- - Rougel: 0.2531
26
- - Rougelsum: 0.2527
27
- - Gen Len: 24.0693
28
 
29
  ## Model description
30
 
31
- More information needed
 
32
 
33
  ## Intended uses & limitations
34
 
35
- More information needed
 
36
 
37
  ## Training and evaluation data
38
 
39
- More information needed
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ## Training procedure
42
 
 
 
43
  ### Training hyperparameters
44
 
45
  The following hyperparameters were used during training:
@@ -56,23 +68,11 @@ The following hyperparameters were used during training:
56
  - lr_scheduler_type: linear
57
  - num_epochs: 20.0
58
 
59
- ### Training results
60
-
61
- | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
62
- |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|:-------:|
63
- | 2.0449 | 1.0 | 512 | 1.6755 | 0.0817 | 0.0 | 0.0778 | 0.0817 | 16.7581 |
64
- | 1.707 | 2.0 | 1025 | 1.6182 | 0.096 | 0.0 | 0.097 | 0.1 | 15.8621 |
65
- | 1.5398 | 3.0 | 1537 | 1.6085 | 0.1394 | 0.0034 | 0.1401 | 0.1416 | 16.4765 |
66
- | 1.4142 | 4.0 | 2050 | 1.6016 | 0.1132 | 0.0 | 0.1132 | 0.1098 | 16.2732 |
67
- | 1.3102 | 5.0 | 2562 | 1.6241 | 0.2082 | 0.0034 | 0.2054 | 0.2061 | 16.2877 |
68
- | 1.2162 | 6.0 | 3075 | 1.6281 | 0.1549 | 0.0 | 0.1549 | 0.1549 | 16.1581 |
69
- | 1.1364 | 7.0 | 3587 | 1.6622 | 0.1583 | 0.0 | 0.1575 | 0.1589 | 15.9925 |
70
- | 1.0649 | 8.0 | 4100 | 1.6812 | 0.2033 | 0.0137 | 0.2012 | 0.2027 | 16.5099 |
71
-
72
-
73
  ### Framework versions
74
 
75
- - Transformers 4.30.2
76
  - Pytorch 1.13.1+rocm5.2
77
- - Datasets 2.12.0
78
- - Tokenizers 0.12.1
 
 
 
12
  widget:
13
  - text: "Мы сели в тачку и поехали по ресторанам. Что такое тачка?"
14
  example_title: "Definition generation"
15
+ license: cc-by-sa-4.0
16
  ---
17
 
18
+ # mT0-Definition-Ru XL
19
 
20
+ This model is a version of [mT0 XL](https://huggingface.co/bigscience/mt0-xl) finetuned on the Russian part of [CodWoE](https://aclanthology.org/2022.semeval-1.1/),
21
+ a dataset of definitions and usage examples.
22
 
23
+ It generates definitions of Russian words in context.
24
+ Its input is the usage example and the instruction question "Что такое TARGET_WORD?"
 
 
 
 
 
25
 
26
  ## Model description
27
 
28
+ See details in the paper `Enriching Word Usage Graphs with Cluster Definitions` (LREC-COLING'2024) by
29
+ Mariia Fedorova, Andrey Kutuzov, Nikolay Arefyev and Dominik Schlechtweg.
30
 
31
  ## Intended uses & limitations
32
 
33
+ The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions.
34
+ Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model.
35
 
36
  ## Training and evaluation data
37
 
38
+ Russian subset of *CodWoE* ([Mickus et al., SemEval 2022](https://aclanthology.org/2022.semeval-1.1/)).
39
+
40
+ ## Training results
41
+
42
+ mT0-Definition-Ru XL achieves the following results on the CodWoE evaluation set:
43
+
44
+ - Loss: 1.7996
45
+ - Rouge1: 17.5576
46
+ - Rouge2: 8.7614
47
+ - Rougel: 17.2533
48
+ - Rougelsum: 17.3204
49
+ - Gen Len: 21.6774
50
 
51
  ## Training procedure
52
 
53
+ mT0-Definition-Ru XL was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions.
54
+
55
  ### Training hyperparameters
56
 
57
  The following hyperparameters were used during training:
 
68
  - lr_scheduler_type: linear
69
  - num_epochs: 20.0
70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  ### Framework versions
72
 
73
+ - Transformers 4.37.1
74
  - Pytorch 1.13.1+rocm5.2
75
+ - Datasets 2.16.1
76
+ - Tokenizers 0.15.1
77
+
78
+ ## Citation
all_results.json CHANGED
@@ -1,18 +1,18 @@
1
  {
2
- "epoch": 8.0,
3
- "eval_gen_len": 24.06929198682766,
4
- "eval_loss": 1.6240657567977905,
5
- "eval_rouge1": 0.2536,
6
- "eval_rouge2": 0.003,
7
- "eval_rougeL": 0.2531,
8
- "eval_rougeLsum": 0.2527,
9
- "eval_runtime": 635.3216,
10
  "eval_samples": 7288,
11
- "eval_samples_per_second": 11.471,
12
- "eval_steps_per_second": 0.359,
13
- "train_loss": 1.429130665848895,
14
- "train_runtime": 9768.8261,
15
  "train_samples": 65584,
16
- "train_samples_per_second": 134.272,
17
- "train_steps_per_second": 1.048
18
  }
 
1
  {
2
+ "epoch": 15.0,
3
+ "eval_gen_len": 21.67735745614035,
4
+ "eval_loss": 1.7995822429656982,
5
+ "eval_rouge1": 17.5576,
6
+ "eval_rouge2": 8.7614,
7
+ "eval_rougeL": 17.2533,
8
+ "eval_rougeLsum": 17.3204,
9
+ "eval_runtime": 423.3617,
10
  "eval_samples": 7288,
11
+ "eval_samples_per_second": 17.215,
12
+ "eval_steps_per_second": 0.539,
13
+ "train_loss": 1.153788715837463,
14
+ "train_runtime": 20083.7121,
15
  "train_samples": 65584,
16
+ "train_samples_per_second": 65.311,
17
+ "train_steps_per_second": 0.51
18
  }
config.json CHANGED
@@ -3,6 +3,7 @@
3
  "architectures": [
4
  "MT5ForConditionalGeneration"
5
  ],
 
6
  "d_ff": 5120,
7
  "d_kv": 64,
8
  "d_model": 2048,
@@ -26,7 +27,7 @@
26
  "tie_word_embeddings": false,
27
  "tokenizer_class": "T5Tokenizer",
28
  "torch_dtype": "float32",
29
- "transformers_version": "4.30.2",
30
  "use_cache": true,
31
  "vocab_size": 250112
32
  }
 
3
  "architectures": [
4
  "MT5ForConditionalGeneration"
5
  ],
6
+ "classifier_dropout": 0.0,
7
  "d_ff": 5120,
8
  "d_kv": 64,
9
  "d_model": 2048,
 
27
  "tie_word_embeddings": false,
28
  "tokenizer_class": "T5Tokenizer",
29
  "torch_dtype": "float32",
30
+ "transformers_version": "4.37.1",
31
  "use_cache": true,
32
  "vocab_size": 250112
33
  }
eval_results.json CHANGED
@@ -1,13 +1,13 @@
1
  {
2
- "epoch": 8.0,
3
- "eval_gen_len": 24.06929198682766,
4
- "eval_loss": 1.6240657567977905,
5
- "eval_rouge1": 0.2536,
6
- "eval_rouge2": 0.003,
7
- "eval_rougeL": 0.2531,
8
- "eval_rougeLsum": 0.2527,
9
- "eval_runtime": 635.3216,
10
  "eval_samples": 7288,
11
- "eval_samples_per_second": 11.471,
12
- "eval_steps_per_second": 0.359
13
  }
 
1
  {
2
+ "epoch": 15.0,
3
+ "eval_gen_len": 21.67735745614035,
4
+ "eval_loss": 1.7995822429656982,
5
+ "eval_rouge1": 17.5576,
6
+ "eval_rouge2": 8.7614,
7
+ "eval_rougeL": 17.2533,
8
+ "eval_rougeLsum": 17.3204,
9
+ "eval_runtime": 423.3617,
10
  "eval_samples": 7288,
11
+ "eval_samples_per_second": 17.215,
12
+ "eval_steps_per_second": 0.539
13
  }
generation_config.json CHANGED
@@ -2,5 +2,5 @@
2
  "decoder_start_token_id": 0,
3
  "eos_token_id": 1,
4
  "pad_token_id": 0,
5
- "transformers_version": "4.30.2"
6
  }
 
2
  "decoder_start_token_id": 0,
3
  "eos_token_id": 1,
4
  "pad_token_id": 0,
5
+ "transformers_version": "4.37.1"
6
  }
pytorch_model-00001-of-00002.bin → pytorch_model-00001-of-00003.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b0519cc6dc79c0fcbc0f8f27d4e48c28178c7c577df247a331dc728c70182766
3
- size 9977020596
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:883fe03c74638936701aa6db8ea888ebcdfcecaf9a5e896ed5d0db65dbefa436
3
+ size 4993619647
pytorch_model-00002-of-00003.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06c1a331261c210d7adb2977052855975e876fcf786ac4df398d3a638c6af82c
3
+ size 4983398004
pytorch_model-00002-of-00002.bin → pytorch_model-00003-of-00003.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:14c74cc690ee7006a16c32655d698367209d2739396c548ca02b6fc8aef2eaef
3
  size 4993663292
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:653556b3b21b03d7198cb94d6a4405d2e3cb6b329a06cc4333c710e9f43a1e45
3
  size 4993663292
pytorch_model.bin.index.json CHANGED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json CHANGED
@@ -1,5 +1,23 @@
1
  {
2
- "eos_token": "</s>",
3
- "pad_token": "<pad>",
4
- "unk_token": "<unk>"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  }
 
1
  {
2
+ "eos_token": {
3
+ "content": "</s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "pad_token": {
10
+ "content": "<pad>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
  }
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6502d07619068a98aa2d3bb531332a694ffe108ca6c6fe62a467ccfe98d666b9
3
- size 16315219
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54e5c72a5ea09da48b2f316760b8bc5a445683ab9a5bc6b68db5d8db624ecceb
3
+ size 16315213
tokenizer_config.json CHANGED
@@ -1,5 +1,31 @@
1
  {
2
- "additional_special_tokens": null,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  "clean_up_tokenization_spaces": true,
4
  "eos_token": "</s>",
5
  "extra_ids": 0,
 
1
  {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<pad>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "</s>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<unk>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ }
27
+ },
28
+ "additional_special_tokens": [],
29
  "clean_up_tokenization_spaces": true,
30
  "eos_token": "</s>",
31
  "extra_ids": 0,
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 8.0,
3
- "train_loss": 1.429130665848895,
4
- "train_runtime": 9768.8261,
5
  "train_samples": 65584,
6
- "train_samples_per_second": 134.272,
7
- "train_steps_per_second": 1.048
8
  }
 
1
  {
2
+ "epoch": 15.0,
3
+ "train_loss": 1.153788715837463,
4
+ "train_runtime": 20083.7121,
5
  "train_samples": 65584,
6
+ "train_samples_per_second": 65.311,
7
+ "train_steps_per_second": 0.51
8
  }
trainer_state.json CHANGED
@@ -1,8 +1,9 @@
1
  {
2
- "best_metric": 0.2082,
3
- "best_model_checkpoint": "mt0-xl_russian_natprompt_adafactor/checkpoint-2562",
4
- "epoch": 8.0,
5
- "global_step": 4100,
 
6
  "is_hyper_param_search": false,
7
  "is_local_process_zero": true,
8
  "is_world_process_zero": true,
@@ -10,168 +11,305 @@
10
  {
11
  "epoch": 1.0,
12
  "learning_rate": 4.75e-05,
13
- "loss": 2.0449,
14
  "step": 512
15
  },
16
  {
17
  "epoch": 1.0,
18
- "eval_gen_len": 16.75809549945115,
19
- "eval_loss": 1.6754746437072754,
20
- "eval_rouge1": 0.0817,
21
- "eval_rouge2": 0.0,
22
- "eval_rougeL": 0.0778,
23
- "eval_rougeLsum": 0.0817,
24
- "eval_runtime": 196.0938,
25
- "eval_samples_per_second": 37.166,
26
- "eval_steps_per_second": 1.163,
27
  "step": 512
28
  },
29
  {
30
  "epoch": 2.0,
31
  "learning_rate": 4.4995117187500005e-05,
32
- "loss": 1.707,
33
  "step": 1025
34
  },
35
  {
36
  "epoch": 2.0,
37
- "eval_gen_len": 15.862102085620197,
38
- "eval_loss": 1.6181610822677612,
39
- "eval_rouge1": 0.096,
40
- "eval_rouge2": 0.0,
41
- "eval_rougeL": 0.097,
42
- "eval_rougeLsum": 0.1,
43
- "eval_runtime": 171.5849,
44
- "eval_samples_per_second": 42.475,
45
- "eval_steps_per_second": 1.329,
46
  "step": 1025
47
  },
48
  {
49
  "epoch": 3.0,
50
  "learning_rate": 4.24951171875e-05,
51
- "loss": 1.5398,
52
  "step": 1537
53
  },
54
  {
55
  "epoch": 3.0,
56
- "eval_gen_len": 16.47653677277717,
57
- "eval_loss": 1.6085278987884521,
58
- "eval_rouge1": 0.1394,
59
- "eval_rouge2": 0.0034,
60
- "eval_rougeL": 0.1401,
61
- "eval_rougeLsum": 0.1416,
62
- "eval_runtime": 171.9932,
63
- "eval_samples_per_second": 42.374,
64
- "eval_steps_per_second": 1.326,
65
  "step": 1537
66
  },
67
  {
68
  "epoch": 4.0,
69
  "learning_rate": 3.9990234375e-05,
70
- "loss": 1.4142,
71
  "step": 2050
72
  },
73
  {
74
  "epoch": 4.0,
75
- "eval_gen_len": 16.273188803512625,
76
- "eval_loss": 1.6016370058059692,
77
- "eval_rouge1": 0.1132,
78
- "eval_rouge2": 0.0,
79
- "eval_rougeL": 0.1132,
80
- "eval_rougeLsum": 0.1098,
81
- "eval_runtime": 171.2054,
82
- "eval_samples_per_second": 42.569,
83
- "eval_steps_per_second": 1.332,
84
  "step": 2050
85
  },
86
  {
87
  "epoch": 5.0,
88
  "learning_rate": 3.7490234375e-05,
89
- "loss": 1.3102,
90
  "step": 2562
91
  },
92
  {
93
  "epoch": 5.0,
94
- "eval_gen_len": 16.287733260153676,
95
- "eval_loss": 1.6240657567977905,
96
- "eval_rouge1": 0.2082,
97
- "eval_rouge2": 0.0034,
98
- "eval_rougeL": 0.2054,
99
- "eval_rougeLsum": 0.2061,
100
- "eval_runtime": 170.3025,
101
- "eval_samples_per_second": 42.794,
102
- "eval_steps_per_second": 1.339,
103
  "step": 2562
104
  },
105
  {
106
  "epoch": 6.0,
107
  "learning_rate": 3.49853515625e-05,
108
- "loss": 1.2162,
109
  "step": 3075
110
  },
111
  {
112
  "epoch": 6.0,
113
- "eval_gen_len": 16.158068057080133,
114
- "eval_loss": 1.6281158924102783,
115
- "eval_rouge1": 0.1549,
116
- "eval_rouge2": 0.0,
117
- "eval_rougeL": 0.1549,
118
- "eval_rougeLsum": 0.1549,
119
- "eval_runtime": 171.3659,
120
- "eval_samples_per_second": 42.529,
121
- "eval_steps_per_second": 1.33,
122
  "step": 3075
123
  },
124
  {
125
  "epoch": 7.0,
126
  "learning_rate": 3.2485351562499996e-05,
127
- "loss": 1.1364,
128
  "step": 3587
129
  },
130
  {
131
  "epoch": 7.0,
132
- "eval_gen_len": 15.992453347969265,
133
- "eval_loss": 1.6622037887573242,
134
- "eval_rouge1": 0.1583,
135
- "eval_rouge2": 0.0,
136
- "eval_rougeL": 0.1575,
137
- "eval_rougeLsum": 0.1589,
138
- "eval_runtime": 254.3332,
139
- "eval_samples_per_second": 28.655,
140
- "eval_steps_per_second": 0.896,
141
  "step": 3587
142
  },
143
  {
144
  "epoch": 8.0,
145
  "learning_rate": 2.998046875e-05,
146
- "loss": 1.0649,
147
  "step": 4100
148
  },
149
  {
150
  "epoch": 8.0,
151
- "eval_gen_len": 16.509879253567508,
152
- "eval_loss": 1.6811630725860596,
153
- "eval_rouge1": 0.2033,
154
- "eval_rouge2": 0.0137,
155
- "eval_rougeL": 0.2012,
156
- "eval_rougeLsum": 0.2027,
157
- "eval_runtime": 173.1353,
158
- "eval_samples_per_second": 42.094,
159
- "eval_steps_per_second": 1.317,
160
  "step": 4100
161
  },
162
  {
163
- "epoch": 8.0,
164
- "step": 4100,
165
- "total_flos": 9.102827646479237e+17,
166
- "train_loss": 1.429130665848895,
167
- "train_runtime": 9768.8261,
168
- "train_samples_per_second": 134.272,
169
- "train_steps_per_second": 1.048
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
170
  }
171
  ],
 
172
  "max_steps": 10240,
 
173
  "num_train_epochs": 20,
174
- "total_flos": 9.102827646479237e+17,
 
 
175
  "trial_name": null,
176
  "trial_params": null
177
  }
 
1
  {
2
+ "best_metric": 17.3273,
3
+ "best_model_checkpoint": "models/mt0-xl_russian_natprompt_adafactor_updated/checkpoint-6150",
4
+ "epoch": 14.999024390243903,
5
+ "eval_steps": 500,
6
+ "global_step": 7687,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 4.75e-05,
14
+ "loss": 2.0388,
15
  "step": 512
16
  },
17
  {
18
  "epoch": 1.0,
19
+ "eval_gen_len": 16.58484100877193,
20
+ "eval_loss": 1.6734575033187866,
21
+ "eval_rouge1": 14.1367,
22
+ "eval_rouge2": 7.0437,
23
+ "eval_rougeL": 14.0625,
24
+ "eval_rougeLsum": 14.0916,
25
+ "eval_runtime": 270.6111,
26
+ "eval_samples_per_second": 26.932,
27
+ "eval_steps_per_second": 0.843,
28
  "step": 512
29
  },
30
  {
31
  "epoch": 2.0,
32
  "learning_rate": 4.4995117187500005e-05,
33
+ "loss": 1.7098,
34
  "step": 1025
35
  },
36
  {
37
  "epoch": 2.0,
38
+ "eval_gen_len": 16.68050986842105,
39
+ "eval_loss": 1.6203718185424805,
40
+ "eval_rouge1": 15.2619,
41
+ "eval_rouge2": 7.8124,
42
+ "eval_rougeL": 15.159,
43
+ "eval_rougeLsum": 15.2078,
44
+ "eval_runtime": 276.6842,
45
+ "eval_samples_per_second": 26.341,
46
+ "eval_steps_per_second": 0.824,
47
  "step": 1025
48
  },
49
  {
50
  "epoch": 3.0,
51
  "learning_rate": 4.24951171875e-05,
52
+ "loss": 1.539,
53
  "step": 1537
54
  },
55
  {
56
  "epoch": 3.0,
57
+ "eval_gen_len": 16.61417214912281,
58
+ "eval_loss": 1.6058766841888428,
59
+ "eval_rouge1": 15.9942,
60
+ "eval_rouge2": 8.1827,
61
+ "eval_rougeL": 15.872,
62
+ "eval_rougeLsum": 15.9105,
63
+ "eval_runtime": 263.8074,
64
+ "eval_samples_per_second": 27.626,
65
+ "eval_steps_per_second": 0.864,
66
  "step": 1537
67
  },
68
  {
69
  "epoch": 4.0,
70
  "learning_rate": 3.9990234375e-05,
71
+ "loss": 1.403,
72
  "step": 2050
73
  },
74
  {
75
  "epoch": 4.0,
76
+ "eval_gen_len": 16.26343201754386,
77
+ "eval_loss": 1.6041721105575562,
78
+ "eval_rouge1": 16.6383,
79
+ "eval_rouge2": 8.4603,
80
+ "eval_rougeL": 16.5096,
81
+ "eval_rougeLsum": 16.5635,
82
+ "eval_runtime": 251.4581,
83
+ "eval_samples_per_second": 28.983,
84
+ "eval_steps_per_second": 0.907,
85
  "step": 2050
86
  },
87
  {
88
  "epoch": 5.0,
89
  "learning_rate": 3.7490234375e-05,
90
+ "loss": 1.295,
91
  "step": 2562
92
  },
93
  {
94
  "epoch": 5.0,
95
+ "eval_gen_len": 15.741365131578947,
96
+ "eval_loss": 1.6226089000701904,
97
+ "eval_rouge1": 16.9189,
98
+ "eval_rouge2": 8.8384,
99
+ "eval_rougeL": 16.7799,
100
+ "eval_rougeLsum": 16.8258,
101
+ "eval_runtime": 169.6881,
102
+ "eval_samples_per_second": 42.949,
103
+ "eval_steps_per_second": 1.344,
104
  "step": 2562
105
  },
106
  {
107
  "epoch": 6.0,
108
  "learning_rate": 3.49853515625e-05,
109
+ "loss": 1.1984,
110
  "step": 3075
111
  },
112
  {
113
  "epoch": 6.0,
114
+ "eval_gen_len": 15.888157894736842,
115
+ "eval_loss": 1.6289030313491821,
116
+ "eval_rouge1": 16.9788,
117
+ "eval_rouge2": 8.7272,
118
+ "eval_rougeL": 16.8238,
119
+ "eval_rougeLsum": 16.8765,
120
+ "eval_runtime": 175.0677,
121
+ "eval_samples_per_second": 41.63,
122
+ "eval_steps_per_second": 1.302,
123
  "step": 3075
124
  },
125
  {
126
  "epoch": 7.0,
127
  "learning_rate": 3.2485351562499996e-05,
128
+ "loss": 1.1195,
129
  "step": 3587
130
  },
131
  {
132
  "epoch": 7.0,
133
+ "eval_gen_len": 16.23519736842105,
134
+ "eval_loss": 1.6697918176651,
135
+ "eval_rouge1": 17.0912,
136
+ "eval_rouge2": 8.7061,
137
+ "eval_rougeL": 16.9084,
138
+ "eval_rougeLsum": 16.9633,
139
+ "eval_runtime": 171.9395,
140
+ "eval_samples_per_second": 42.387,
141
+ "eval_steps_per_second": 1.326,
142
  "step": 3587
143
  },
144
  {
145
  "epoch": 8.0,
146
  "learning_rate": 2.998046875e-05,
147
+ "loss": 1.0463,
148
  "step": 4100
149
  },
150
  {
151
  "epoch": 8.0,
152
+ "eval_gen_len": 16.14761513157895,
153
+ "eval_loss": 1.6845269203186035,
154
+ "eval_rouge1": 17.201,
155
+ "eval_rouge2": 8.7395,
156
+ "eval_rougeL": 17.003,
157
+ "eval_rougeLsum": 17.052,
158
+ "eval_runtime": 252.7052,
159
+ "eval_samples_per_second": 28.84,
160
+ "eval_steps_per_second": 0.902,
161
  "step": 4100
162
  },
163
  {
164
+ "epoch": 9.0,
165
+ "learning_rate": 2.748046875e-05,
166
+ "loss": 0.9866,
167
+ "step": 4612
168
+ },
169
+ {
170
+ "epoch": 9.0,
171
+ "eval_gen_len": 15.878837719298245,
172
+ "eval_loss": 1.726230502128601,
173
+ "eval_rouge1": 17.3223,
174
+ "eval_rouge2": 8.8289,
175
+ "eval_rougeL": 17.1413,
176
+ "eval_rougeLsum": 17.1756,
177
+ "eval_runtime": 182.5703,
178
+ "eval_samples_per_second": 39.919,
179
+ "eval_steps_per_second": 1.249,
180
+ "step": 4612
181
+ },
182
+ {
183
+ "epoch": 10.0,
184
+ "learning_rate": 2.49755859375e-05,
185
+ "loss": 0.9326,
186
+ "step": 5125
187
+ },
188
+ {
189
+ "epoch": 10.0,
190
+ "eval_gen_len": 15.797149122807017,
191
+ "eval_loss": 1.7532711029052734,
192
+ "eval_rouge1": 17.2655,
193
+ "eval_rouge2": 8.7512,
194
+ "eval_rougeL": 17.0508,
195
+ "eval_rougeLsum": 17.1055,
196
+ "eval_runtime": 168.7949,
197
+ "eval_samples_per_second": 43.177,
198
+ "eval_steps_per_second": 1.351,
199
+ "step": 5125
200
+ },
201
+ {
202
+ "epoch": 11.0,
203
+ "learning_rate": 2.24755859375e-05,
204
+ "loss": 0.8844,
205
+ "step": 5637
206
+ },
207
+ {
208
+ "epoch": 11.0,
209
+ "eval_gen_len": 16.32360197368421,
210
+ "eval_loss": 1.7794246673583984,
211
+ "eval_rouge1": 17.008,
212
+ "eval_rouge2": 8.5404,
213
+ "eval_rougeL": 16.8044,
214
+ "eval_rougeLsum": 16.848,
215
+ "eval_runtime": 168.6102,
216
+ "eval_samples_per_second": 43.224,
217
+ "eval_steps_per_second": 1.352,
218
+ "step": 5637
219
+ },
220
+ {
221
+ "epoch": 12.0,
222
+ "learning_rate": 1.9970703125e-05,
223
+ "loss": 0.8393,
224
+ "step": 6150
225
+ },
226
+ {
227
+ "epoch": 12.0,
228
+ "eval_gen_len": 16.143092105263158,
229
+ "eval_loss": 1.7995822429656982,
230
+ "eval_rouge1": 17.3273,
231
+ "eval_rouge2": 8.7829,
232
+ "eval_rougeL": 17.097,
233
+ "eval_rougeLsum": 17.1644,
234
+ "eval_runtime": 171.5723,
235
+ "eval_samples_per_second": 42.478,
236
+ "eval_steps_per_second": 1.329,
237
+ "step": 6150
238
+ },
239
+ {
240
+ "epoch": 13.0,
241
+ "learning_rate": 1.7470703125000003e-05,
242
+ "loss": 0.8046,
243
+ "step": 6662
244
+ },
245
+ {
246
+ "epoch": 13.0,
247
+ "eval_gen_len": 16.090597587719298,
248
+ "eval_loss": 1.8266295194625854,
249
+ "eval_rouge1": 17.1859,
250
+ "eval_rouge2": 8.6524,
251
+ "eval_rougeL": 16.9605,
252
+ "eval_rougeLsum": 17.0118,
253
+ "eval_runtime": 259.1646,
254
+ "eval_samples_per_second": 28.121,
255
+ "eval_steps_per_second": 0.88,
256
+ "step": 6662
257
+ },
258
+ {
259
+ "epoch": 14.0,
260
+ "learning_rate": 1.49658203125e-05,
261
+ "loss": 0.7682,
262
+ "step": 7175
263
+ },
264
+ {
265
+ "epoch": 14.0,
266
+ "eval_gen_len": 16.11239035087719,
267
+ "eval_loss": 1.8624775409698486,
268
+ "eval_rouge1": 17.0184,
269
+ "eval_rouge2": 8.5314,
270
+ "eval_rougeL": 16.8019,
271
+ "eval_rougeLsum": 16.847,
272
+ "eval_runtime": 170.9938,
273
+ "eval_samples_per_second": 42.621,
274
+ "eval_steps_per_second": 1.333,
275
+ "step": 7175
276
+ },
277
+ {
278
+ "epoch": 15.0,
279
+ "learning_rate": 1.2465820312500002e-05,
280
+ "loss": 0.7419,
281
+ "step": 7687
282
+ },
283
+ {
284
+ "epoch": 15.0,
285
+ "eval_gen_len": 15.95751096491228,
286
+ "eval_loss": 1.8779526948928833,
287
+ "eval_rouge1": 17.2742,
288
+ "eval_rouge2": 8.6795,
289
+ "eval_rougeL": 17.0699,
290
+ "eval_rougeLsum": 17.1118,
291
+ "eval_runtime": 177.9916,
292
+ "eval_samples_per_second": 40.946,
293
+ "eval_steps_per_second": 1.281,
294
+ "step": 7687
295
+ },
296
+ {
297
+ "epoch": 15.0,
298
+ "step": 7687,
299
+ "total_flos": 1.7085595424946913e+18,
300
+ "train_loss": 1.153788715837463,
301
+ "train_runtime": 20083.7121,
302
+ "train_samples_per_second": 65.311,
303
+ "train_steps_per_second": 0.51
304
  }
305
  ],
306
+ "logging_steps": 500,
307
  "max_steps": 10240,
308
+ "num_input_tokens_seen": 0,
309
  "num_train_epochs": 20,
310
+ "save_steps": 500,
311
+ "total_flos": 1.7085595424946913e+18,
312
+ "train_batch_size": 4,
313
  "trial_name": null,
314
  "trial_params": null
315
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cd882549d0b55e5e64fc1bccbe7a30e73de399085741636b5d21642c59b40dda
3
- size 4091
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b05c536c471b5be16fe45bc4130c51587d993402ce5ba1bd7ba28b30f5b50b5b
3
+ size 4411
upload.py DELETED
@@ -1,11 +0,0 @@
1
- #!/bin/env python3
2
-
3
- import sys
4
- from huggingface_hub import HfApi
5
- from huggingface_hub import create_repo
6
-
7
- create_repo(sys.argv[1])
8
- api = HfApi()
9
-
10
- api.upload_folder(folder_path=".", repo_id=sys.argv[1], repo_type="model")
11
-